"Andy Hassall" <[EMAIL PROTECTED]> writes:
> 1b. If a Perl string with the utf8 flag is bound to a statement, it
> is bound as UTF8 rather than the client character set. Otherwise it is bound
> as normal (in the client character set).
Please do not do this. I will try to explain why.
In Perl, the utf8 flag shouldn't carry any semantics, it should be
purely a matter of internal representation of the string. Thus it is
perfectly possible for two strings to be equal, even though one has the
utf8 flag set and one has the utf8 flag cleared:
use strict;
use warnings;
use Encode();
use Devel::Peek;
my $bytes = "\xe6\xf8\xe5";
my $utf8s = Encode::decode_utf8(Encode::encode_utf8($bytes));
Dump($bytes);
Dump($utf8s);
print $bytes eq $utf8s ? "Equal\n" : "Not equal\n";
The output is
SV = PV(0x811beb0) at 0x81268c4
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x8121a50 "\346\370\345"\0
CUR = 3
LEN = 4
SV = PV(0x81a233c) at 0x81268e8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8)
PV = 0x81584c8 "\303\246\303\270\303\245"\0 [UTF8 "\x{e6}\x{f8}\x{e5}"]
CUR = 6
LEN = 7
Equal
So $bytes and $utf8s is the _same_ string, consisting of the same three
characters, though Perl internally stores it in different ways.
Therefore, both strings should work the same way (by default at least)
when bound in DBI. To quote from perldoc Encode: "This utf8 flag is not
visible in perl scripts". Therefore it should not become visible through
the use of DBI.
I haven't followed the discussion closely, but I believe the core of the
problem is that some (old?) code may bind strings as sequences of bytes
in the database character set. Whereas other (new?) code binds strings
as sequences of unicode characters. As far as I can see, there is no way
for DBI to reliably distinguish between these two situations, the user
will have to tell one way or the other (whether by handle attribyte,
bind option, or defaults based on environment/database config).
- Kristian.
--
Kristian Nielsen [EMAIL PROTECTED]
Development Manager, Sifira A/S