On Sun, 2006-07-16 at 21:04 +0100, George McLachlan wrote: > Just to dip my toe in the water, so to speak, I wrote a couple of simple > charsets, which haven't been implemented yet.
Already.. Great! Ok, before we can commit this we need the paperwork to get done (copyright assignment as I said), so I'm forwarding this to Mark who'll get you hooked up with that. I hope. (Priority, Mark!) > Of course I might have got it completely wrong. Seems good! As Tromey pointed out, you forgot to add them to the Provider class, but that's a simple matter. I tested the encoders (being somewhat pedantic) against the JDK and they seem just fine. The results were that one char diffed in Cp037 (your mapping is the better one). A bunch of chars diffed in Cp868, so that's the case where I'd look a bit closer to see what's going on. (This page notes the discrepancy: http://www.haible.de/bruno/charsets/conversion-tables/CP868.html ) In this case isolated Arabic forms are being mapped to the Arabic code block in your table, and to the isolated forms in the Arabic Presentation Forms code block in the JDK's table. Luckily I read the Unicode spec for this recently and it explicitly states that APF codes should *not* be used for information interchange. So while there's no official Unicode mapping for this charset the JDK is still in violation of the spec here. So no worries! Of course, you don't have to be as pedantic as I am :) I'd trust libiconv's tables. This certainly gives reason to. /Sven
