On Sun, Mar 21, 2004 at 01:10:27PM -0800, Dean Arnold wrote: > 2. The charset used for client<->server transfer syntax > may not be the same as the internal storage charset > of the DBMS (e.g., UTF8 may be used for transfer, > but the target columns may be Latin1; attemping > to insert a non-Latin1 compatible UTF8 character > results in a DBMS error).
... or a silent conversion, either via transliteration or to some substitute character. > Assume I'm transfering data from DBMS A in Latin1 > to DBMS B in UTF8. Are there any guarantees that > DBMS A will output its returned character data in UTF8 ? > Or, going in the other direction, that DBMS A will > recognize that the parameter data supplied from DBMS B > is in UTF8, and will make the necessary conversion to Latin1 > before transfer ? I believe that you always have to tell the database server in what encoding your client has the data and ask it to do the conversion for you. Via environment variable (NLS_LANG), command (SET CLIENT_ENCODING, SET CHARACTER SET), or other means. > Since there doesn't appear to be a way to "tag" a given > Perl string with its charset encoding (other than UTF8), > it would appear that normalization on UTF8 would be > required, or, alternately, some add'l column/parameter > metadata is needed to define the encoding, which (to my > knowledge) does not yet exist in std. DBI metadata. You mean on input or on output. As far as I know, DBD::Oracle and DBD::Pg (with pg_enable_utf8) already mark the strings they return as UTF-8 Perl strings (and DBI->trace will show them with double quotes). So the internal Perl UTF-8 flag is the mechanism you are looking for. > The DBI POD makes reference to using the defined locale...does > that mean drivers should use $ENV{LANGUAGE} or $ENV{LANG} > or $ENV{LC_ALL} or $ENV{LC_TYPE} to determine the "normalized" > charset, and fallback to UTF8 if none of the above are defined ? Since you cannot native string handling in Perl other than in US-ASCII and UTF-8 (for example, uc $string will not work if the $string hold binary characters in ISO-8859-3), I believe that using UTF-8 is the way to go. With all other encodings yielding undocumented results ... -- ------------------------------------------------------------------------ Honza Pazdziora | [EMAIL PROTECTED] | http://www.fi.muni.cz/~adelton/ .project: Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ... Only self-confident people can be simple.