Re: wchar_t, mbrtowc and iconv (was: Re: Updated UTF-8 decoder stress test file)

Tom Tromey Tue, 05 Sep 2000 14:44:56 -0700

>>>>> "Marcin" == Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> writes:

>> Java's FileReader class (which implicitly converts char* to
>> Unicode) takes an encoding argument. The list of permitted
>> encodings is again platform and version dependent. Best is not to
>> use this explicit encoding argument and rely on the locale
>> dependent default value.

Marcin> How do Java implementations find this locale dependent default
Marcin> value?  Do they use e.g. iconv for the actual conversion? Or
Marcin> determine only the name of the encoding somehow and implement
Marcin> the conversion themselves?

I don't know about other Java implementations, but the one I work on
has some built-in conversions (we have UTF-8 and SJIS), but it also
knows how to fall back on iconv() on systems that have it.

We currently hard-code the default encoding to Latin-1, which is lame.
I've recently made our compiler smarter about this, and it uses
nl_langinfo(CODESET) (on platforms that have it) to find the default
encoding name.  Of course, that isn't portable in practice, but at
least it works in some situations.

It's all rather ugly because even if a system supports iconv() there
is no guarantee that it will choose names for encodings the same way
that any other platform does.  That means that a robust Java
implementation would have to map the Java encoding names (which aren't
really standardized in Java but are listed in an example in the JCL
book) to platform-specific names in some way.  We haven't done this
yet, simply because there's been no demand.

Tom
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Re: wchar_t, mbrtowc and iconv (was: Re: Updated UTF-8 decoder stress test file)

Reply via email to