Re: confusing bullets

John Delacour Sun, 11 Jan 2004 05:16:11 -0800

At 9:26 pm -0500 10/1/04, Vic Norton wrote:

I'm sorry, John. I was talking figuratively. I didn't mean real bullets.

FIguratively or no, you were right on target with your choice. The bullet is a character in the 'macintosh' character set (referred to wrongly by the Perl people "MacRoman") which does not exist in the widely used (or at least declared) charset Latin-1 and has the same 8-bit codepoint as the i with diaeresis « ï » in the Windows-1252 charset. It is to rebuild this Tower of Babel that Unicode was conceived and, far too slowly, brought into the computer world first in Windows NT and finally in Mac OS X. Unicode is a 'good thing' but it requires to be learned about and you'll come unstuck pretty often if you don't put aside a bit of time to do so.

<http://www.unicode.org/standard/WhatIsUnicode.html>

% perldoc -X Encode | more

SEE ALSO
    Encode::Encoding, Encode::Supported, Encode::PerlIO, encoding,
    perlebcdic, "open" in perlfunc, perlunicode, utf8, the Perl Unicode
    Mailing List <[EMAIL PROTECTED]>

How come Perl sees "C2 A0" whenever HexEdit sees "CA" and visa versa? I don't care what kind of characters we are talking here. To paraphrase Gertrude Stein, "a byte is a byte is a byte." At least that's what I thought until now.

Gertrude Stein was a character. Some characters are a byte. Some are not, and you have to care.


        use strict ;
        my $file = "/tmp/file";
        #
        open FILE, ">:utf8", $file ;
        print FILE "\xF0" ;
        close FILE ;
        #
        open FILE, $file ;
        print "UTF-8:\t" .  <FILE> . $/ ;
        close FILE;
        #
        open FILE, ">$file" ;
        print FILE "\xF0" ;
        close FILE ;
        #
        open FILE, $file ;
        print "MacRoman:\t" .  <FILE> . $/ ;
        close FILE;
        exit ;

JD

Re: confusing bullets

Reply via email to