Re: dealing unicode output

[EMAIL PROTECTED] Fri, 25 Jan 2008 07:13:30 -0800

On Jan 24, 7:35 pm, [EMAIL PROTECTED] (Dr.Ruud) wrote:
> [EMAIL PROTECTED] schreef:
>
> > [...] I'm reading an unicode utf-16le file and have successfully
> > done so but with one issue.  When I print the first line of input the
> > BOM is still there...
>
> By specifying the "le", you express that you already know the byte
> order.
> The U+FEFF is then read as the "zero-width no-break space", and not
> as the BOM.
>
> So either toss the "le" or toss the BOM character: s/^\x{FEFF)//;
>
> --
> Affijn, Ruud
>
> "Gewoon is een tijger."


Great! both worked.  The thing I still don't understand is that in the
file the BOM is FFFE not FEFF so I have already tried to use s/
^x{FFFE}//; with no success but your feedback worked with the s/
^{FEFF}//; it is in reverse order for some reason.  Now I need to read
further into "zero-width no-break space", not sure that I understand
why it is called that and not BOM.  Dealing with unicode at the moment
is over my head a bit so thanks very much for the fix to what was a
simple change.  Off to find more material to read about this subject
matter, thanks again!


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: dealing unicode output

Reply via email to