(Note: I'm sending this to both -users and -dev, I'm not
certain which it belongs to at this point)

I'm building a heterogenous multidatabase tool,
and am curious as to how various DBD's are handling
character sets.

I'll try to enumerate the issues (I'm ignoring locale
issues for now):

1. Different DBMS's have varying levels of support
for charset encodings.

2. The charset used for client<->server transfer syntax
may not be the same as the internal storage charset
of the DBMS (e.g., UTF8 may be used for transfer,
but the target columns may be Latin1; attemping
to insert a non-Latin1 compatible UTF8 character
results in a DBMS error).

3. DBD's may also have varying levels of charset
support.

4. Perl 5.8 has (essentially) standardized on UTF8
encodings.

5. Applications may or may not be "UTF8 aware".

Assume I'm transfering data from DBMS A in Latin1
to DBMS B in UTF8. Are there any guarantees that
DBMS A will output its returned character data in UTF8 ?
Or, going in the other direction, that DBMS A will
recognize that the parameter data supplied from DBMS B
is in UTF8, and will make the necessary conversion to Latin1
before transfer ?

Since there doesn't appear to be a way to "tag" a given
Perl string with its charset encoding (other than UTF8),
it would appear that normalization on UTF8 would be
required, or, alternately, some add'l column/parameter
metadata is needed to define the encoding, which (to my
knowledge) does not yet exist in std. DBI metadata. 
The DBI POD makes reference to using the defined locale...does
that mean drivers should use $ENV{LANGUAGE} or $ENV{LANG}
or $ENV{LC_ALL} or $ENV{LC_TYPE} to determine the "normalized"
charset, and fallback to UTF8 if none of the above are defined ?

Is there a consistent charset encoding behavior defined for
DBI at this time ? If so, have most/all DBD's conformed to
that behavior ? If not, is a rule wrt charset encoding behavior
needed ? 

I'm probably babbling and confused at this point, so I'd appreciate
any clarification that might be provided.
If a list of charset behaviors for each DBD is needed,
I'd be happy to put one together, assuming the DBD authors
send me the details for each driver.

TIA,
Dean Arnold
Presicient Corp.
www.presicient.com

Reply via email to