On Jan 25, 2008 10:06 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
snip
> Great! both worked.  The thing I still don't understand is that in the
> file the BOM is FFFE not FEFF
snip

This is because it is little endian, if it were a big endian file it
would be FEFF.  The character is the same, but the order of the bytes
change depending on the endian-ness of the file.  The BOM isn't a
marker that says the file is one endian or another, it is a character
that is known in advance that lets you easily tell which endian the
file is.

snip
> so I have already tried to use s/
> ^x{FFFE}//; with no success but your feedback worked with the s/
> ^{FEFF}//; it is in reverse order for some reason.
snip

Perl uses the Unicode character number for "\x{}", so ZERO WIDTH
NO-BREAK SPACE is "\x{FEFF}" even if it is written to the file in
little-endian bytes FF FE.  Avoid confusing the encoding of Unicode
with Unicode itself.  For instance, The UTF-8 encoding of "\x{FEFF}"
is EF BB BF.

snip
>Now I need to read
> further into "zero-width no-break space", not sure that I understand
> why it is called that and not BOM.  Dealing with unicode at the moment
> is over my head a bit so thanks very much for the fix to what was a
> simple change.  Off to find more material to read about this subject
> matter, thanks again!
snip

from http://en.wikipedia.org/wiki/Byte_Order_Mark
    In most character encodings the BOM is a pattern which
    is unlikely to be seen in other contexts (it would usually
    look like a sequence of obscure control codes). If a BOM
    is misinterpreted as an actual character within Unicode
    text then it will generally be invisible due to the fact it is a
    zero-width no-break space. Use of the U+FEFF character
    for non-BOM purposes has been deprecated in Unicode
    3.2 (which provides an alternative, U+2060, for those
    other purposes), allowing U+FEFF to be used solely with
    the semantic of BOM.

Also, there is a nice chart here:
http://www.websina.com/bugzero/kb/unicode-bom.html

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to