On Thu, Mar 5, 2009 at 2:17 PM, Bill Stephenson <bi...@perlhelp.com> wrote:

> Okay, but now I'm curious. What does ord mean? (or do)


It's an abbreviation of "ordinal," and returns the position of the character
within its charset - i.e., its ordinal value, as opposed to its text value.

Perl's ord() function is encoding-aware, but *only* if Perl knows what
encoding the passed-in string is using, which it doesn't by default. If you
"use utf8" at the top of your script, Perl knows that literal strings are
utf-8 encoded, and flags them appropriately. Likewise if you use the :utf8
I/O layer to open a file handle, like this:

    open($fh, "<:utf8", $filename) or die "Could not open $filename: $!";

Bytes that are input from $fh are then assumed to be utf-8 encoded, and Perl
sets an internal flag on the scalar to indicate this.

That's what happened to the OP. When scalars do *not* have their utf-8 flag
set, ord(), length(), and other builtin functions fall back to assuming that
they're encoded with one byte per character. If that assumption is incorrect
- as it is in the OP's case, where the character takes two bytes in utf-8
encoding - then the results from these functions will likewise be incorrect.

sherm--

-- 
Cocoa programming in Perl: http://camelbones.sourceforge.net

Reply via email to