In message <[EMAIL PROTECTED]>, Phil Taylor <[EMAIL PROTECTED]> writes

On 29 Apr 2004, at 00:32, Steven Bennett wrote:

According to Apple docs (I'll take their word for it... ;):

    0x2028 -- Unicode line separator
    0x2029 -- Unicode paragraph separator

Thank you Steve,

Pardon my ignorance, but how do you know that you're dealing with Unicode
here, rather than the ascii " (" and " )"?

I guess its a problem for some charsets, but for Western ones, the high byte of the two will be NULL. Thus you can scan the text and if you find NULL Bytes before the end of the string (I assume you know your string length) followed by a non-NULL byte you can assume its Unicode.


abc is the characters 65, 66, 67
abc in ASCII is 0x41, 0x42, 0x43
abc in Unicode is 0x00,0x41, 0x00,0x42, 0x00,0x43 but in 16 bit lumps rather than 8 bit.


Not an ideal solution, but for western charsets this test has not failed me yet. Note, I don't do much internationalised code, but there are places where I need to make a reasonable guess (I write debugging tools and don't know for sure what data will be presented to me ahead of time), it works.

For people working with international character sets this trivial test may well fail in some cases.

Stephen
--
Stephen Kellett
Object Media Limited    http://www.objmedia.demon.co.uk
RSI Information:        http://www.objmedia.demon.co.uk/rsi.html
To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to