I did a little checking. A Unicode file is supposed to start with a Byte 
Order Mark. The BOM can be either 0xFF, 0xFE or 0xFE, 0xFF depending on 
the high-low byte order of the wide characters.

So if the first two characters of a file are 0xFF 0xFE then the file 
contains Unicode characters with the low order byte first. Starting with 
0xFE, 0xFF means high order byte first.

Unicode defines ASCII as the first 128 codes, ie an ASCII character gets 
a high order zero byte added to it.

Reading the files is not that difficult except for one giant problem. 
What do you do when you see a non-ASCII character?

Jason

On 4/28/99 7:35 PM Jeremy Wadsack ([EMAIL PROTECTED]) wrote:

>Actually this is one place where they're starting to get it right.
>Notepad and Wordpad both support Unicaode which is a double-byte
>character system (DBCS). I'm not sure how widely supported it is or if
>it's compatible with the other DBCS (used by IBM and other more
>international aware companies). There is a setting somewhere I think to
>tell the system to default to standard (SBCS) intead of Unicode (check
>the file|save as file-type list).
>
>We just all need to start thinking in DBCS (or not expect a character to
>necessarily be one byte). It make porting to Japanese, Korean, Mandarin,
>and Arabic much easier!
>
>
>Ian Zimmerman wrote:
>
>> Keith Purtell <[EMAIL PROTECTED]> writes:
>>
>> > Somewhere in the process of handling the first copy of the log, I
>> > probably used software (Notepad or WordPad) that damaged the file.
>> > Another "user error." Thank you, Stephen and Jason!
>>
>> I certainly hope you're being sarcastic about the software here .. if
>> Microsoft really made people _expect_ such stuff to happen, there's
>> not much hope.


-----------------
[EMAIL PROTECTED]
-----------------
Dr. Seuss books . . . can be read and enjoyed on several levels. For
example, 'One Fish Two Fish, Red Fish Blue Fish' can be deconstructed
as a searing indictment of the narrow-minded binary counting system.
  -- Peter van der Linden, Expert C Programming, Deep C Secrets


--------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
--------------------------------------------------------------------

Reply via email to