Hello, On Sun, Mar 15, 2009 at 1:35 PM, Greg Sabino Mullane <[email protected]> wrote: >> When I fetch player names from the database above, >> they don't seem to be recognized as UTF8: >> ... >> Can't DBD::Pg recognize that it's UTF8 data? > > You have not told us what version of DBI and DBD::Pg you > are using. Please also provide a simple test case - it's > hard to guess at what a program might be doing. Far better > to provide some code.
ok, sorry. It's OpenBSD 4.3 + default perl and packages: perl, v5.8.8 built for i386-openbsd p5-DBD-Pg-1.49 p5-DBI-1.59 postgresql-client-8.2.6 postgresql-server-8.2.6 And here is my test case, the last printed line shows my problem: http://pastebin.com/f6fc68309 $ cat dbi-utf.pl #!/usr/bin/perl -w use strict; use utf8; use DBI qw(:utils); use Encode qw(encode_utf8 decode_utf8); use constant HEARTS_HTML => pack ' U', 0x2665; use constant X => 'phpbb'; my ($dbh, $ins1, $ins2, $sel1, $sel2, $href, $str1, $str2); $dbh = DBI->connect('dbi:Pg:dbname=' . X, X, X, { RaiseError => 1}); $dbh->do('create table test1 (col1 integer, col2 varchar(50))'); $dbh->do('create table test2 (col3 integer, col4 text)'); $ins1 = $dbh->prepare('insert into test1 values (?, ?)'); $ins2 = $dbh->prepare('insert into test2 values (?, ?)'); $sel1 = $dbh->prepare('select * from test1 order by col1'); $sel2 = $dbh->prepare('select * from test2'); $ins1->execute(10, 'ABCDE'); $ins1->execute(20, 'АБВГД'); # the 1st 5 russian letters $sel1->execute(); while ($href = $sel1->fetchrow_hashref()) { print "$href->{col1} $href->{col2}: " . data_string_desc($href->{col2}) . "\n"; $str1 = "russian $href->{col2} russian"; $str2 = HEARTS_HTML . "russian $href->{col2} russian" . HEARTS_HTML; } $ins2->execute(30, $str1); $ins2->execute(40, $str2); $sel2->execute(); while ($href = $sel2->fetchrow_hashref()) { print "$href->{col3} $href->{col4}: " . data_string_desc($href->{col4}) . "\n"; } $dbh->do('drop table test1'); $dbh->do('drop table test2'); $ ./dbi-utf.pl 10 ABCDE: UTF8 off, ASCII, 5 characters 5 bytes 20 АБВГД: UTF8 off, non-ASCII, 10 characters 10 bytes 30 russian АБВГД russian: UTF8 off, non-ASCII, 26 characters 26 bytes 40 ♥russian Ð�Ð�Ð�Ð�Ð� russian♥: UTF8 off, non-ASCII, 42 characters 42 bytes (the 3rd line is ok, but the last line is mangled) I could test RHEL/CentOS 5.2 (at work) on Monday too Regards Alex
