Still futzing around with email and character sets.

Under Encode and perluniintro there's mention of
octet          \x{..} (255 chars up to \xff
string         some internal representation
code point     \x{...} 1, 2 or more bytes of data

But I'm not sure about the order of things.

So I'll try this:

I have a MIME messsage part like the following:

Content-Type: text/plain;
        charset="BIG5"
Content-Transfer-Encoding: base64

1eLKx9K7t+JIVE1MuPHKvdDFvP6joQ0KCqFYoVihWKFYoVihWKFYoVihWKFYoVihWKFYoVihWKFY
oVihWKFYoVihWKFYoVihWKFYoVihWKFYoVihWAqhaapgt06haqRXrbGquoVvpfOBWK5lyU+lSKRV
pOWmcsbTi9ehQ6W7hLCl84Wyra2kX6ZYqmulzrN+IQqGR4VvpfOl0aFtVm9sbGV5bWFpbIVvpfO4
c4T6g/2uYaFuhLCl84T6sGWhRrNRykmkzYVUg2+zzIetrmAKqrqFb6XzuHOE+oSwpfOm06Zoprit
bqhEr3240aFJhGOnS4VkpFWGXqFBxtOtrYO6hX2oz6XOoUMKhkixoYhbhKGD9Kfag6iquqVEg6Sh
R2h0dHA6Ly93d3cuY255c29mdC5jb20v

MIME::Base64 has a function
my $decoded = decode_base64($DATA);
that returns really wonderful crud to my screen.  But I can't regex it.

I think it returns octects.  At least that what MIME::Base64 says.
So I should be able to do

my $base64 = join('',<DATA>);
my $octets = decode_base64($base64);
my $utf8 = decode('Big5',$octets);

and from there I can use something like /(\w+)/ on it.
(But IIRC /[\w]+/ will act weird).

printing it out require 'binmode(...)' but I can do stuff internally to the 
program.

Which is all good.  And I guess it's progress.
But can I expect to ALWAYS find a charset declaration on the Content-Type line if it isn't just ascii? (There is sometimes a content-type in the header which I assume applies to all)

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to