Michael, When reporting issues with DBD::ODBC it is useful to know the operating system name and version and the ODBC driver and version as this can have a big effect.
On 15/09/10 01:18, Michael Ludwig wrote: > I'm facing encoding issues in trying to make use of the XML datatype in > SQL Server 2005, which I'm accessing using DBD::ODBC 1.23 and Perl 5.12. > > CREATE TABLE T2 (a VARCHAR(99), u NVARCHAR(99), x XML); > > Three columns here, a for single-byte characters, u for Unicode, and > x for XML. The XML type is not known to DBD::ODBC as a unicode type. You can override this by adding a TYPE attribute of value -8 (sorry there is no symbol for SQL_WCHAR in DBI at present - that might be worth RTing in itself) to the end of the bind_param call - see later for my example. There is an argument that the XML type should be known as a unicode type in DBD::ODBC and I'd consider that if you care to RT it. > The following statements works correctly in SSMS (SS Management Studio): > > INSERT INTO T2 VALUES ('Käse', N'Käse', CAST( '<d>Käse</d>' AS XML)); > > Käse Käse <d>Käse</d> > > Now German wasn't too difficult, so let's try some Russian. > > INSERT INTO T2 VALUES > ('Москва', N'Москва', CAST('<r>Москва</r>' AS XML)); > INSERT INTO T2 VALUES > ('Москва', N'Москва', CAST(N'<r>Москва</r>' AS XML)); > > ?????? Москва <r>??????</r> > ?????? Москва <r>Москва</r> I think you would have found the XML column was incorrectly inserted if you'd viewed this in SQL Server Enterprise manager - I did. > We need the N introducer for Unicode literals and a column type capable > of receiving Unicode data. Failing any of those two, we're getting just > a series of substitution characters (?). Better to use bind_param and set the type than try and set the type in the SQL. > How can we handle this situation from Perl? Here's a script. > > \,,,/ > (o o) > ------oOOo-(_)-oOOo------ > use strict; > use warnings; no warnings 'uninitialized'; > use utf8; > use DBI; > > my $txt_de = 'Käse'; > my $txt_ru = 'Москва'; > > binmode STDOUT, ':utf8'; > > my @dsn = qw/DBI:ODBC:MY_DB my_username my_password/; > my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1); > my $dbh = DBI->connect( @dsn, \%opt ); > $dbh->{LongReadLen} = 4000; > $dbh->{LongTruncOk} = 1; # Dies muß, ob logisch oder nicht. > > my $sth_ins = $dbh->prepare( > 'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' ); > $sth_ins->execute( $txt_de, $txt_de, "<d>$txt_de</d>" ); > $sth_ins->execute( $txt_ru, $txt_ru, "<r>$txt_ru</r>" ); > > my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' ); > $sth_sel->execute; > $sth_sel->bind_columns( \my( $txt, $xml ) ); > my $i = 0; > while ( $sth_sel->fetch ) { > printf "%3u %3u [%s] [%s]\n", ++$i, length($txt), $txt, $xml; > } > $dbh->disconnect; > ------------------------- > > The problem is the INSERT statement, more specifically, in the XML part. > I can't seem to get it to accept my Unicode strings as Unicode for that > column. Instead, they're treated as octets, resulting in garbage. > > Käse Käse <d>Käse</d> > МоÑква Москва <r>МоÑква</r> > > I've tried to use an N introducer with the XML column, but that leads to > errors. > > CAST( N? AS XML) - Invalid column name 'n...@p3'. (SQL-42S22) > CAST( N ? AS XML) - Incorrect syntax near '@P3'. (SQL-42000) > > I'd appreciate your advice. There is good news and some not so good news (unless you can build DBD::ODBC for your platform). To fix the insert bind the parameter as SQL_WCHAR (-8). However, the select cannot be fixed with the same trick as DBD::ODBC does not currently act on the TYPE attribute to bind_col. I've fixed that and can send you a new DBD::ODBC but you'll have to build it yourself (which is easy if you are UNIX or using Strawberry Perl and only slightly harder if you are using ActiveState on Windows assuming it is a recentish version). By all means RT a request to default XML columns to unicode and that will mean you don't have to specify the TYPE in bind_param. Below is my example which may also indicate another subtle error with length() but I've not had time to look into it yet. use strict; use warnings; no warnings 'uninitialized'; use utf8; use DBI; use bytes; no bytes; my $txt_de = 'Käse'; my $txt_ru = 'Москва'; binmode STDOUT, ':utf8'; my @dsn = qw/DBI:ODBC:xxx xx xx/; my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1, ChopBlanks => 1); my $dbh = DBI->connect( @dsn, \%opt ); $dbh->{LongReadLen} = 4000; $dbh->{LongTruncOk} = 1; # Dies muß, ob logisch oder nicht. my $sth_ins = $dbh->prepare( 'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' ); foreach my $row ([$txt_de, $txt_de, "<d>$txt_de</d>"], [$txt_ru, $txt_ru, "<r>$txt_ru</r>"]) { $sth_ins->bind_param(1, $row->[0]); $sth_ins->bind_param(2, $row->[1]); # defaults to SQL_WCHAR $sth_ins->bind_param(3, $row->[2], {TYPE => -8}); $sth_ins->execute; } #$sth_ins->execute( $txt_de, $txt_de, "<d>$txt_de</d>" ); #$sth_ins->execute( $txt_ru, $txt_ru, "<r>$txt_ru</r>" ); my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' ); $sth_sel->execute; # TYPE not required for column 1 really as DBD::ODBC knows it is unicode $sth_sel->bind_col(1, \my $txt, {TYPE => -8}); $sth_sel->bind_col(2, \my $xml, {TYPE => -8}); #$sth_sel->bind_columns( \my( $txt, $xml ) ); my $i = 0; while ( $sth_sel->fetch ) { printf "%3u %3u %3u %s [%s] [%s]\n", ++$i, length($txt), bytes::length($txt), (utf8::is_utf8($txt) ? ' utf8' : '!utf8'), $txt, $xml; # NOTE, if I don't reset $txt each iteration the length() call returns # the wrong answer - that might also be a bug. #$txt = ''; } $dbh->disconnect; Martin -- Martin J. Evans Easysoft Limited http://www.easysoft.com