On 2005-08-12 16:45:50 +0100, Charles Jardine wrote:
> The method $dbh->prepare($stmt) of DBD::Oracle ignores the
> state of the utf8 flag in the SV for $stmt. For example,
> after
>
> my $a = "select '\xe2' from dual";
> my $b = decode_utf8("select '\xc3\xa2' from dual");
>
> $a and $b compare equal with the perl 'eq' operator. However,
> their internal representations differ. $a is represented in
> iso-8859-1 and its utf8 flag is off. $b is represented in
> utf-8 and its utf8 flag is on.
>
> The two equal statements give different results when run
> using DBD::Oracle. Which gives the 'correct' result depends
> on the client-side database character set (the NLS_LANG
> charset). $a gives the correct result for 8-bit charsets.
> $b gives the correct result if the charset is utf-8.
>
> This is clearly a bug. It can affect any SQL statement which
> contains a non-ASCII character. It can strike whether or not
> Unicode is being used in the database. I would like to fix it.
> This requires that code be put somewhere which decides how to
> process the SV on the basis of its utf8 flag and of the
> NLS_LANG charset.
I am not sure if it is possible to define the right behaviour.
Theoretically, it's simple. Since "\xe2" and decode_utf8("\xc3\xa2")
compare equal, they must be equal, so "\xe2" always is an "รข",
regardless of the NLS_LANG setting. DBD::Oracle would then convert all
strings to the client character set or quietly set the client character
set AL32UTF8 (or UTF8 for older Oracle versions).
However, I am sure there exist tons of code which expect that perl
strings are byte arrays in the client character set. This code would
then break.
I propose the following compromise:
If the client character set is AL32UTF8 or UTF8, then all strings passed
to prepare or inserted into or compared to varchar2 or clob fields are
silently upgraded to utf8. All varchar2 and clob values returned by
DBD::Oracle are in utf8 representation.
Otherwise, all strings are treated as byte arrays in the client
character set.
The documentation mentions in large, friendly letters that the character
set should be set to AL32UTF8 to get consistent behaviour.
hp
--
_ | Peter J. Holzer | In our modern say,learn,know in a day
|_|_) | Sysadmin WSR | world, perhaps being an expert is an
| | | [EMAIL PROTECTED] | outdated concept.
__/ | http://www.hjp.at/ | -- Catharine Drozdowski on dbi-users.
pgpL9AJUODvLY.pgp
Description: PGP signature
