On Thu, Mar 5, 2009 at 2:17 PM, Bill Stephenson <bi...@perlhelp.com> wrote:
> Okay, but now I'm curious. What does ord mean? (or do) It's an abbreviation of "ordinal," and returns the position of the character within its charset - i.e., its ordinal value, as opposed to its text value. Perl's ord() function is encoding-aware, but *only* if Perl knows what encoding the passed-in string is using, which it doesn't by default. If you "use utf8" at the top of your script, Perl knows that literal strings are utf-8 encoded, and flags them appropriately. Likewise if you use the :utf8 I/O layer to open a file handle, like this: open($fh, "<:utf8", $filename) or die "Could not open $filename: $!"; Bytes that are input from $fh are then assumed to be utf-8 encoded, and Perl sets an internal flag on the scalar to indicate this. That's what happened to the OP. When scalars do *not* have their utf-8 flag set, ord(), length(), and other builtin functions fall back to assuming that they're encoded with one byte per character. If that assumption is incorrect - as it is in the OP's case, where the character takes two bytes in utf-8 encoding - then the results from these functions will likewise be incorrect. sherm-- -- Cocoa programming in Perl: http://camelbones.sourceforge.net