Re: HTML::Entities and WinLatin1 NCRs [PATCH]

David Wheeler Tue, 07 Mar 2006 11:39:01 -0800

On Mar 7, 2006, at 07:58, David Wheeler wrote:

All of the characters in question are control characters (rarelyused) in all character sets except CP1252 (and other CPs?), whichis why Encode doesn't convert them. But it's been a while since Ilooked into this, so I'm not clear on the details anymore.

Oh, I remember now. If you use Encode to convert from CP1252 toUTF-8. At least I found that, in my tests, it worked properly:



  use Encode;
  $utf8_text = decode('cp1252', $cp1252)_text, 1);

I was originally going to add support for converting from the CP1252gremlins to UTF-8, but when I found that Encode already did itproperly, I eliminated it. My module is only for those who requireLatin 1 support, in which there is no support for those extracharacters. Even if you go from CP152 to UTF-8 to Latin 1 thecharacters won't come through, because they simply don't exist inLatin 1.


Am I misunderstanding things, Dan?

Best,

David

Re: HTML::Entities and WinLatin1 NCRs [PATCH]

Reply via email to