On Jan 25, 10:30 am, [EMAIL PROTECTED] (Chas. Owens) wrote:
> On Jan 25, 2008 10:06 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> snip> Great! both worked.  The thing I still don't understand is that in the
> > file the BOM is FFFE not FEFF
>
> snip
>
> This is because it is little endian, if it were a big endian file it
> would be FEFF.  The character is the same, but the order of the bytes
> change depending on the endian-ness of the file.  The BOM isn't a
> marker that says the file is one endian or another, it is a character
> that is known in advance that lets you easily tell which endian the
> file is.
>
> snip> so I have already tried to use s/
> > ^x{FFFE}//; with no success but your feedback worked with the s/
> > ^{FEFF}//; it is in reverse order for some reason.
>
> snip
>
> Perl uses the Unicode character number for "\x{}", so ZERO WIDTH
> NO-BREAK SPACE is "\x{FEFF}" even if it is written to the file in
> little-endian bytes FF FE.  Avoid confusing the encoding of Unicode
> with Unicode itself.  For instance, The UTF-8 encoding of "\x{FEFF}"
> is EF BB BF.
>
> snip>Now I need to read
> > further into "zero-width no-break space", not sure that I understand
> > why it is called that and not BOM.  Dealing with unicode at the moment
> > is over my head a bit so thanks very much for the fix to what was a
> > simple change.  Off to find more material to read about this subject
> > matter, thanks again!
>
> snip
>
> fromhttp://en.wikipedia.org/wiki/Byte_Order_Mark
>     In most character encodings the BOM is a pattern which
>     is unlikely to be seen in other contexts (it would usually
>     look like a sequence of obscure control codes). If a BOM
>     is misinterpreted as an actual character within Unicode
>     text then it will generally be invisible due to the fact it is a
>     zero-width no-break space. Use of the U+FEFF character
>     for non-BOM purposes has been deprecated in Unicode
>     3.2 (which provides an alternative, U+2060, for those
>     other purposes), allowing U+FEFF to be used solely with
>     the semantic of BOM.
>
> Also, there is a nice chart 
> here:http://www.websina.com/bugzero/kb/unicode-bom.html

Thanks for the feedback...  I will look into the sites you sent for
additional information. Thanks!


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to