-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 20 Feb 2013, Philip Prindeville wrote:

Awesome, that worked!

I'm wondering if in MIME::Body we should take:

sub as_string {
my $self = shift;
my $str = '';
my $fh = IO::File->new(\$str, '>:') or croak("Cannot open in-memory file: $!");
$self->print($fh);
close($fh);
return $str;
}

and have:

return Encode::decode($charset, $str);

I suppose that violates the internals of the MIME:: and Mail:: namespace functions. They are tied together very closly.

Actually, I looked into a UTF8 MIMEtools a few years back to overcome character set problems when storing header data into a postgres database. I thought that everything the MIME:: functions should return would be in Perl utf8, any character set information already decoded. Anything the functions get passed into is Perl internal utf-8 as well. I think one would need to rewrite the whole framework anew.

instead, but I'm not sure how we'd retrieve $charset…  It would need to be 
stored into MIME::Body which isn't currently the case.

Encode is a tricky module by its own, perldoc Encode:

"Handling Malformed Data
The optional CHECK argument tells Encode what to do when it encounters malformed data. Without CHECK, Encode::FB_DEFAULT ( == 0 ) is assumed.

As of version 2.12 Encode supports coderef values for CHECK. See below.

       NOTE: Not all encoding support this feature
Some encodings ignore CHECK argument. For example, Encode::Unicode ignores CHECK and it always croaks on error.
"

Some encodings modify the $str argument to return the characters NOT decoded. So you'd call Encode::decode($charset, "".$str) to enforce a copy - - but have the performance penalty.

I also got weired results with decode('latin1', $str). I guess because of "CAVEAT: When you run "$string = decode("utf8", $octets)", then $string may not be equal to $octets. Though they both contain the same data, the UTF8 flag for $string is on unless $octets entirely consists of ASCII data (or EBCDIC on EBCDIC machines)." When I pass results of decode('latin1', $str) to LDAP or Postgres, I sometimes get errors.

I pass all strings through a function now, that looks terrible, but since then Web, Postgres, LDAP and text files play together.

On Feb 20, 2013, at 6:21 PM, David F. Skoll <[email protected]> wrote:
Try putting "use Encode;" near the top of your test file and replacing

utf8::upgrade($string);

with:

$string = Encode::decode('utf-8', $string);

In fact, I found that utf8::upgrade() works for me in order to replace decode('latin1'), which seems to "do nothing", causing other modules, like Net::LDAP or DBD::Pg, to pass invalid UTF8 to the services.

- -- Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEVAwUBUSX4uZ8mjdm1m0FfAQJLPAf9EPC0E+gm5cJ4PvwxQHT2MzGoTmfLz1/C
nd7kihJnCqmWHQeYLhRlETqX4D1vG/ZGS6WbaP8Fybn400Tfb4JZBs9kZafS7dri
z3r6wk70Vd0By7GM5zIPlTbovU7HqiIFBBoHrdLkaSvzGq95ZfyH5u8aZjj39D85
2nDracTpxp9VF1rsgDi9I3z2lJpRjtJsufVUTvIhynOghQoAhw0S8FEAp7CrLnOX
UHsTTW1+CPhJA3zxY7jgGKV65smNYjtB4MZ1D0cxq2Y6Op7R2NmbRZrlXfFsfMBs
ah7y6nOmlOOpJ1oG760qZY31GjAcvuHgzcliV6rBXueMb1qSM3yHyw==
=A/mV
-----END PGP SIGNATURE-----
_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list [email protected]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Reply via email to