Re: utf8 questions.

Jay Savage Mon, 18 Apr 2005 08:42:53 -0700

On 4/18/05, Rajarshi Das <[EMAIL PROTECTED]> wrote:
> Hi,
> I am using perl-5.8.6 on z/OS.
> 1) What is the BOM on z/OS ? Basically, I cant print the chars "\xFE\xFF".
> Even though \xFE is defined as Latin Capital Letter U with Acute, the char
> doesnt display. Also, \xFF isnt defined.
> 
> 2) What is the difference between the utf8::encode and utf8::upgrade
> routines ?
> e.g. $a = 'hello';
> utf8::upgrade($a);
> 
> $a = "\xFE\xFF";
> utf8::encode($a);
> 
> Should I use 'encode' when the scalar contains bytes and I need to convert
> those bytes into utf8 bytes (as in byte representation in unicode) ?
> And use 'upgrade' when the scalar contains a normal string that I want to
> convert to a utf8 string of characters ?
> 
> Thanks in advance,
> Rajarshi.
>


1) this is a function of your charater display, and your system's
unicode support, but see perldoc ebcdic.  Also make sure that you are
actually using utf8, though layers, or use utf8, or the
utf8::functions.

2) in most cases, these functions perform the same task.  The main
difference is that utf8::encode clears the utf flag on the string; 
this can be important when switching back and forth between different
encodings.  utf8::also returns the number of bytes needed to represent
the string, which can be handy.  Do not, though, pass unicode bytes to
utf8::encode.  it will attempt to determine the encoding and respond
appropriately, but in many cases, including your example, it will
assume the bytes are some other encoding, and re-encode them yeilding
unpredictable results.

You can use your bytestring as-is.  If you want to turn it back in OS
native encoding (might be needed for ebcdic, I don't know), use
utf8::downgrade, or utf8::decode.

Check out perldoc perluniintro.

HTH,

--jay

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: utf8 questions.

Reply via email to