On Jue, 9 de Diciembre de 2004, 2:49, Leszek Gawron dijo: > Bertrand Delacretaz wrote: >> Le 9 d�c. 04, � 09:21, Leszek Gawron a �crit : >> >>> ...By the way: it is a little bit different on win32. Some tools >>> detect utf encoding by checking for BOM. If there is none - ANSI >>> encoding is assumed... >> >> >> AFAIU this is ok for 16-bit based encodings, not for UTF-8. >> >> -Bertrand > http://www.xencraft.com/resources/unicodebom.html > <quote> > Even though UTF-8 does not need a BOM to indicate endianness, Microsoft > Notepad began prepending a BOM to its UTF-8 text files. Actually, it is > a conversion of U+FEFF to an encoding as UTF-8 serialized bytes: EF BB > BF (or in 4GL: CHR(15711167)). There is some value in the BOM being used > as a file signature, indicating the plain text file is encoded as > Unicode UTF-8, as opposed to some other code page. That particular > 3-byte sequence is unlikely to represent data in any other code page, > given the text is supposed to be human readable in some language. > However, there is some small possibility that it represents some string > in some code page... Because Microsoft did it, and there is so much > Notepad data out there, the UTF-8 BOM became a de facto standard and > then a de jure standard. (Although the BOM is optional.) > </quote> > > M$ again.
This is the standard: http://www.zvon.org/tmRFC/RFC3023/Output/chapter8.html#sub1 :-D Best Regards, Antonio Gallardo.
