RE: Byte Order Marks

2001-04-20 Thread Yves Arrouye
Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not UTF16_BigEndian? ICU does not do Unicode-signature or other encoding detection as part of a converter. When you get text from some protocol, you need to instantiate a converter according to what you know about the

RE: Byte Order Marks

2001-04-20 Thread Yves Arrouye
On Thu, Apr 19, 2001 at 06:24:47PM -0700, Markus Scherer wrote: On the other hand, if you get a file from your platform and it is in 16-bit Unicode, then you would appreciate the convenience of the auto-endian alias. But nothing should be spitting out platform-endian UTF-16! In the

Re: Byte Order Marks

2001-04-20 Thread Markus Scherer
Yves, we are thinking about a general API for encoding detection that could initially just check for BOM/Unicode signatures. I believe we have a feature request for this already. Mark and I just brainstormed about what we may want an API look like. The reason for doing what ICU is doing

Byte Order Marks

2001-04-19 Thread Tomas McGuinness
Hi, A quick question relating to the Byte Order Mark of UCS-2. If its absent is it safe to assume any particular order (i.e. Big or Little Endian?). I am writing a function to rearrange from Big to little endian but without a byte order mark I'm not sure what the order is. Is there any

Re: Byte Order Marks

2001-04-19 Thread Markus Scherer
There is an RFC about UTF-16 that explains this: If the text is labeled by the protocol as charset=UTF-16 then the first two bytes are the byte order mark charset=UTF-16BE then it is big-endian and the first two bytes are just text charset=UTF-16LE then it is little-endian and the first two

RE: Byte Order Marks

2001-04-19 Thread Yves Arrouye
If you don't have any clue about the byte order, but you know it is UTF-16, then assume BE. Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not UTF16_BigEndian? I know that was a difference between ICU and my library, and when I asked this question a while ago I was told that despite

Fwd: Re: Byte Order Marks

2001-04-19 Thread Asmus Freytag
Date: Thu, 19 Apr 2001 12:59:43 -0700 To: Tomas McGuinness [EMAIL PROTECTED] From: Asmus Freytag [EMAIL PROTECTED] Subject: Re: Byte Order Marks At 02:58 PM 4/19/01 +0200, you wrote: If its absent is it safe to assume any particular order (i.e. Big or Little Endian?) The default order is Big

Re: Byte Order Marks

2001-04-19 Thread Markus Scherer
Yves Arrouye wrote: If you don't have any clue about the byte order, but you know it is UTF-16, then assume BE. Then why is ICU mapping UTF-16 to UTF16_PlatformEndian and not UTF16_BigEndian? ICU does not do Unicode-signature or other encoding detection as part of a converter. When you

Re: Byte Order Marks

2001-04-19 Thread David Starner
On Thu, Apr 19, 2001 at 06:24:47PM -0700, Markus Scherer wrote: On the other hand, if you get a file from your platform and it is in 16-bit Unicode, then you would appreciate the convenience of the auto-endian alias. But nothing should be spitting out platform-endian UTF-16! In the case that

Byte Order Marks

2001-04-10 Thread Tomas McGuinness
Hi, When looking at a document would it be safe to assume that if you found any of the following Byte Order Marks * 0xFFFE (UCS-2 Little Endian) * 0xFEFE (UCS-2 Big Endian) * 0xEFBBBF (UTF-8) That the document is encoded with that encoding format. That means that if I found

Re: Byte Order Marks

2001-04-10 Thread DougEwell2
In a message dated 2001-04-10 3:04:09 Pacific Daylight Time, [EMAIL PROTECTED] writes: When looking at a document would it be safe to assume that if you found any of the following Byte Order Marks *0xFFFE (UCS-2 Little Endian) *0xFEFE (UCS-2 Big Endian) should be 0xFEFF