On 29 Apr 2004, at 08:34, Stephen Kellett wrote:

In message <[EMAIL PROTECTED]>, Phil Taylor <[EMAIL PROTECTED]> writes

On 29 Apr 2004, at 00:32, Steven Bennett wrote:

According to Apple docs (I'll take their word for it... ;):

    0x2028 -- Unicode line separator
    0x2029 -- Unicode paragraph separator

Thank you Steve,

Pardon my ignorance, but how do you know that you're dealing with Unicode
here, rather than the ascii " (" and " )"?

I guess its a problem for some charsets, but for Western ones, the high byte of the two will be NULL. Thus you can scan the text and if you find NULL Bytes before the end of the string (I assume you know your string length) followed by a non-NULL byte you can assume its Unicode.


abc is the characters 65, 66, 67
abc in ASCII is 0x41, 0x42, 0x43
abc in Unicode is 0x00,0x41, 0x00,0x42, 0x00,0x43 but in 16 bit lumps rather than 8 bit.

OK, I understand that. What was bothering me though, is how Steven B's parser is going to deal with regular ascii strings which include a space followed by a bracket. It's no problem when everything is unicode, or everything is ascii, but if we are to have ascii abc which may include unicode strings, we will need a way of indicating this to the parser, will we not?


Phil Taylor

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to