Client-APIs, unicode and charsets

Sven Köhler Sun, 04 Sep 2005 09:59:42 -0700

Hi,

i always wondered, what the JDBC driver does if it isn't accessing a
unicode-database. What is does, is quite inconvenient. AFAIK is assumes
ISO-8859-1. IMHO, the charset used for binary data _should_ be a
parameter of the connection-url and it should default to java's default
charset - well, or to ISO-8859-1 if you like that better.


On the other hand, there's another problem:
What if i connect with "unicode=yes" to a non-unicode database? I guess,
the MaxDB-kernel will convert the unicode-strings back to byte-strings -
but which charset is used for that? I guess this questions also applies
to writing strings to the database.

IMHO the JDBC should default to "unicode=yes" but with an adjustable
charset for all conversions from unicode- to byte-strings - even those
conversions that take place in the MaxDB kernel.

Non-unicode database are therefor currently unusable for JDBC-clients,
if the applications that write into the database (non-JDBC-clients)
don't use ISO-8859-1. In most cases, the charset these applications use
will depend on their current environment (i.e. the locale).
The main point is: i guess that byte-strings are copied into the
database uncheck - i mean, you cannot assume that these strings are
ISO-8859-1 or anythings else. They are just byte strings.

On the other hand, currently unicode-database are currently unsuable for
clients like DBD::MaxDB, ODBC (using the byte-string-API), ...

All that i've said also applies to ODBC (if the ODBC-unicode API is
used). The ODBC-driver doesn't accept any charset-parameter too. So any
conversions that takes place will again be based on some charset that's
either forced by the current locale-settings or hardcoded.


Well, are you aware of all the problems?

When will that change?


Thanks
  Sven

signature.asc
Description: OpenPGP digital signature

Client-APIs, unicode and charsets

Reply via email to