On Jan 25, 10:30 am, [EMAIL PROTECTED] (Chas. Owens) wrote: > On Jan 25, 2008 10:06 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > snip> Great! both worked. The thing I still don't understand is that in the > > file the BOM is FFFE not FEFF > > snip > > This is because it is little endian, if it were a big endian file it > would be FEFF. The character is the same, but the order of the bytes > change depending on the endian-ness of the file. The BOM isn't a > marker that says the file is one endian or another, it is a character > that is known in advance that lets you easily tell which endian the > file is. > > snip> so I have already tried to use s/ > > ^x{FFFE}//; with no success but your feedback worked with the s/ > > ^{FEFF}//; it is in reverse order for some reason. > > snip > > Perl uses the Unicode character number for "\x{}", so ZERO WIDTH > NO-BREAK SPACE is "\x{FEFF}" even if it is written to the file in > little-endian bytes FF FE. Avoid confusing the encoding of Unicode > with Unicode itself. For instance, The UTF-8 encoding of "\x{FEFF}" > is EF BB BF. > > snip>Now I need to read > > further into "zero-width no-break space", not sure that I understand > > why it is called that and not BOM. Dealing with unicode at the moment > > is over my head a bit so thanks very much for the fix to what was a > > simple change. Off to find more material to read about this subject > > matter, thanks again! > > snip > > fromhttp://en.wikipedia.org/wiki/Byte_Order_Mark > In most character encodings the BOM is a pattern which > is unlikely to be seen in other contexts (it would usually > look like a sequence of obscure control codes). If a BOM > is misinterpreted as an actual character within Unicode > text then it will generally be invisible due to the fact it is a > zero-width no-break space. Use of the U+FEFF character > for non-BOM purposes has been deprecated in Unicode > 3.2 (which provides an alternative, U+2060, for those > other purposes), allowing U+FEFF to be used solely with > the semantic of BOM. > > Also, there is a nice chart > here:http://www.websina.com/bugzero/kb/unicode-bom.html
Thanks for the feedback... I will look into the sites you sent for additional information. Thanks! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/