On Mar 7, 2006, at 07:58, David Wheeler wrote:

All of the characters in question are control characters (rarely used) in all character sets except CP1252 (and other CPs?), which is why Encode doesn't convert them. But it's been a while since I looked into this, so I'm not clear on the details anymore.

Oh, I remember now. If you use Encode to convert from CP1252 to UTF-8. At least I found that, in my tests, it worked properly:


  use Encode;
  $utf8_text = decode('cp1252', $cp1252)_text, 1);

I was originally going to add support for converting from the CP1252 gremlins to UTF-8, but when I found that Encode already did it properly, I eliminated it. My module is only for those who require Latin 1 support, in which there is no support for those extra characters. Even if you go from CP152 to UTF-8 to Latin 1 the characters won't come through, because they simply don't exist in Latin 1.

Am I misunderstanding things, Dan?

Best,

David

Reply via email to