From: "Jay Savage" <[EMAIL PROTECTED]>
Try to unpack the data--or a chunk of data you feel is large enough to
be representative--with the pattern U0U*. If the unpack succeeds with
no warnings, you have valid utf8. You could try the same thing with
Encode's 'decode_utf8' routine. See perluniintro for details. in both
cases, though, you need to make sure that you've grabbed well-formed
utf8 from the source file in the first place. If the data cuts off in
the middle of a multi-byte character, you'll get an error.
I have tried verifying the entire string, using the following:
my $result = unpack("U0U*", $content);
print $result;
Well, it gave no errors even though the string was UTF-8 or not, but an
interesting thing is that the result printed was always 65279 if the string
was UTF-8 and 112 or 116 if the string was not UTF-8.
Do you know what represent these numbers? I am curious why sometimes it
prints 112 and sometimes 116 when using some ansi strings.
I hope the result is consistent and I can base on it to use the code in my
program for checking if a string is UTF-8.
Thank you.
Octavian
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/