On Mar 7, 2006, at 07:58, David Wheeler wrote:
All of the characters in question are control characters (rarely
used) in all character sets except CP1252 (and other CPs?), which
is why Encode doesn't convert them. But it's been a while since I
looked into this, so I'm not clear on the details anymore.
Oh, I remember now. If you use Encode to convert from CP1252 to
UTF-8. At least I found that, in my tests, it worked properly:
use Encode;
$utf8_text = decode('cp1252', $cp1252)_text, 1);
I was originally going to add support for converting from the CP1252
gremlins to UTF-8, but when I found that Encode already did it
properly, I eliminated it. My module is only for those who require
Latin 1 support, in which there is no support for those extra
characters. Even if you go from CP152 to UTF-8 to Latin 1 the
characters won't come through, because they simply don't exist in
Latin 1.
Am I misunderstanding things, Dan?
Best,
David