(Note: I'm sending this to both -users and -dev, I'm not certain which it belongs to at this point)
I'm building a heterogenous multidatabase tool, and am curious as to how various DBD's are handling character sets. I'll try to enumerate the issues (I'm ignoring locale issues for now): 1. Different DBMS's have varying levels of support for charset encodings. 2. The charset used for client<->server transfer syntax may not be the same as the internal storage charset of the DBMS (e.g., UTF8 may be used for transfer, but the target columns may be Latin1; attemping to insert a non-Latin1 compatible UTF8 character results in a DBMS error). 3. DBD's may also have varying levels of charset support. 4. Perl 5.8 has (essentially) standardized on UTF8 encodings. 5. Applications may or may not be "UTF8 aware". Assume I'm transfering data from DBMS A in Latin1 to DBMS B in UTF8. Are there any guarantees that DBMS A will output its returned character data in UTF8 ? Or, going in the other direction, that DBMS A will recognize that the parameter data supplied from DBMS B is in UTF8, and will make the necessary conversion to Latin1 before transfer ? Since there doesn't appear to be a way to "tag" a given Perl string with its charset encoding (other than UTF8), it would appear that normalization on UTF8 would be required, or, alternately, some add'l column/parameter metadata is needed to define the encoding, which (to my knowledge) does not yet exist in std. DBI metadata. The DBI POD makes reference to using the defined locale...does that mean drivers should use $ENV{LANGUAGE} or $ENV{LANG} or $ENV{LC_ALL} or $ENV{LC_TYPE} to determine the "normalized" charset, and fallback to UTF8 if none of the above are defined ? Is there a consistent charset encoding behavior defined for DBI at this time ? If so, have most/all DBD's conformed to that behavior ? If not, is a rule wrt charset encoding behavior needed ? I'm probably babbling and confused at this point, so I'd appreciate any clarification that might be provided. If a list of charset behaviors for each DBD is needed, I'd be happy to put one together, assuming the DBD authors send me the details for each driver. TIA, Dean Arnold Presicient Corp. www.presicient.com