Re: [cp-patches] Cp037 and Cp868 charsets

Sven de Marothy Sun, 16 Jul 2006 20:33:28 -0700

On Sun, 2006-07-16 at 21:04 +0100, George McLachlan wrote:
> Just to dip my toe in the water, so to speak, I wrote a couple of simple
> charsets, which haven't been implemented yet.


Already.. Great! Ok, before we can commit this we need the paperwork to
get done (copyright assignment as I said), so I'm forwarding this to
Mark who'll get you hooked up with that. I hope. (Priority, Mark!) 

> Of course I might have got it completely wrong.

Seems good! As Tromey pointed out, you forgot to add them to the
Provider class, but that's a simple matter. I tested the encoders
(being somewhat pedantic) against the JDK and they seem just fine. 
The results were that one char diffed in Cp037 (your mapping is the
better one). 

A bunch of chars diffed in Cp868, so that's the case where 
I'd look a bit closer to see what's going on. 
(This page notes the discrepancy:
http://www.haible.de/bruno/charsets/conversion-tables/CP868.html )

In this case isolated Arabic forms are being mapped to the Arabic code
block in your table, and to the isolated forms in the Arabic
Presentation Forms code block in the JDK's table. Luckily I read the
Unicode spec for this recently and it explicitly states that APF codes
should *not* be used for information interchange. 

So while there's no official Unicode mapping for this charset the JDK is
still in violation of the spec here. So no worries!
Of course, you don't have to be as pedantic as I am :) 
I'd trust libiconv's tables. This certainly gives reason to.

/Sven

Re: [cp-patches] Cp037 and Cp868 charsets

Reply via email to