RE: UTF8 issue

Fernando Munoz Wed, 29 Jan 2003 10:58:23 -0800

Thanks Phillip, that solves the problem. I managed myself to find a less
elegant but, equally effective, solution. I operates over the string passing
the result to a second scalar that gets encoded as a string of bytes:


my ($description, $value) = split(":",$biblio[$n]);  <- These are UTF8
Encoded
my $value = sprintf("%4.2f", $value); <- Here $value goes back to a string
of bytes
my $lstring = length($description);
my $newdesc = substr($description,0,$lstring); <- Here $newdesc has
$description as a string of bytes

After this the digests are all different and correct. It is not elegant but
works.

Thanks again.

-----Original Message-----
From: Philip Mak [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, January 29, 2003 10:07 AM
To: Fernando Munoz
Cc: '[EMAIL PROTECTED]'
Subject: Re: UTF8 issue


I'm guessing you'll have to somehow "cast" the UTF8 strings so that
they're interpreted byte-by-byte, rather than character-by-character.

Maybe try "use utf8;" and then pass utf8::encode($str) instead of $str
to the MD5 function.

On Wed, Jan 29, 2003 at 09:50:13AM -0800, Fernando Munoz wrote:
> Well, there's no error logging that I can refer to, but when you try
> to hexdec these strings (the ones coming in UTF8) no matter how
> different the strings are, they always return the same digest.
> Searching around I find this note :
> 
> "Perl 5.8 support Unicode characters in strings. Since the MD5
> algorithm is only defined for strings of bytes, it can not be used
> on strings that contains chars with ordinal number above 255. The
> MD5 functions and methods will croak if you try to feed them such
> input data:"   
> 
> in the documentation for Digest::MD5
> (http://search.cpan.org/author/GAAS/Digest-MD5/MD5.pm). 
___________________________________________________ 
Lions Gate Entertainment, Inc.  [ AMEX: lgf ] 
Five Proud Years, One Independent Spirit.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: UTF8 issue

Reply via email to