I spent a little time recently looking at the character set conversion
code in Classpath and libgcj. I don't have time to do the merge right
now, but I thought I'd post my recommendations for posterity.
Both code bases take roughly the same approach. Only details differ.
My recommendation is to take the libgcj framework, and then heavily
modify it to incorporate the good points from Classpath:
* libgcj's framework appears to be mildly more efficient
* ... it is also a bit cleaner in some minor respects (e.g., getName()
is an abstract method and doesn't require that each class set
`scheme_name', etc. It also doesn't rely on reflection as heavily.)
* Classpath has javadoc comments, which are nice for people writing
new converters. This feature must be preserved. (Yes, I've come
180 degrees on this issue. :-)
* Classpath has a nicer naming scheme for encoders and decoders.
We should also continue to use Classpath's choice for package name.
* The libgcj UTF-8 converters seem a bit better. They are known to be
buggy in some (unusual) cases, but they handle more than Classpath's
do. (e.g., Java-style \0 encoding is handled correctly, as an
option.)
* Otherwise there isn't much overlap between supported encodings, so
we can just use the union of the two.
* Right now libgcj can handle using iconv() as a fallback if the
encoding is not built in. We should continue to do this. We should
also consider optionally using something like Bruno Haible's
libiconv to support many encodings on those platforms without a
native iconv() implementation.
I'll do this sometime when I have a free weekend or something, unless
somebody beats me to it.
This, like so many other things, is possibly contingent on getting the
CNI/JNI problem resolved. I'm going to try to get some time scheduled
to work on that. Don't hold your breath :-(
Tom