Michael,

When reporting issues with DBD::ODBC it is useful to know the operating system 
name and version and the ODBC driver and version as this can have a big effect.

On 15/09/10 01:18, Michael Ludwig wrote:
> I'm facing encoding issues in trying to make use of the XML datatype in
> SQL Server 2005, which I'm accessing using DBD::ODBC 1.23 and Perl 5.12.
> 
>   CREATE TABLE T2 (a VARCHAR(99), u NVARCHAR(99), x XML);
> 
> Three columns here, a for single-byte characters, u for Unicode, and
> x for XML.

The XML type is not known to DBD::ODBC as a unicode type. You can override this 
by adding a TYPE attribute of value -8 (sorry there is no symbol for SQL_WCHAR 
in DBI at present - that might be worth RTing in itself) to the end of the 
bind_param call - see later for my example. There is an argument that the XML 
type should be known as a unicode type in DBD::ODBC and I'd consider that if 
you care to RT it. 

> The following statements works correctly in SSMS (SS Management Studio):
> 
>   INSERT INTO T2 VALUES ('Käse', N'Käse', CAST( '<d>Käse</d>' AS XML));
> 
>   Käse    Käse    <d>Käse</d>
> 
> Now German wasn't too difficult, so let's try some Russian.
> 
>   INSERT INTO T2 VALUES
>   ('Москва', N'Москва', CAST('<r>Москва</r>' AS XML));
>   INSERT INTO T2 VALUES
>   ('Москва', N'Москва', CAST(N'<r>Москва</r>' AS XML));
> 
>   ??????  Москва  <r>??????</r>
>   ??????  Москва  <r>Москва</r>

I think you would have found the XML column was incorrectly inserted if you'd 
viewed this in SQL Server Enterprise manager - I did.

> We need the N introducer for Unicode literals and a column type capable
> of receiving Unicode data. Failing any of those two, we're getting just
> a series of substitution characters (?).

Better to use bind_param and set the type than try and set the type in the SQL.
 
> How can we handle this situation from Perl? Here's a script.
> 
>           \,,,/
>           (o o)
> ------oOOo-(_)-oOOo------
> use strict;
> use warnings; no warnings 'uninitialized';
> use utf8;
> use DBI;
> 
> my $txt_de = 'Käse';
> my $txt_ru = 'Москва';
> 
> binmode STDOUT, ':utf8';
> 
> my @dsn = qw/DBI:ODBC:MY_DB my_username my_password/;
> my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1);
> my $dbh = DBI->connect( @dsn, \%opt );
> $dbh->{LongReadLen} = 4000;
> $dbh->{LongTruncOk} = 1; # Dies muß, ob logisch oder nicht.
> 
> my $sth_ins = $dbh->prepare(
>   'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' );
> $sth_ins->execute( $txt_de, $txt_de, "<d>$txt_de</d>" );
> $sth_ins->execute( $txt_ru, $txt_ru, "<r>$txt_ru</r>" );
> 
> my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' );
> $sth_sel->execute;
> $sth_sel->bind_columns( \my( $txt, $xml ) );
> my $i = 0;
> while ( $sth_sel->fetch ) {
>   printf "%3u %3u [%s] [%s]\n", ++$i, length($txt), $txt, $xml;
> }
> $dbh->disconnect;
> -------------------------
> 
> The problem is the INSERT statement, more specifically, in the XML part.
> I can't seem to get it to accept my Unicode strings as Unicode for that
> column. Instead, they're treated as octets, resulting in garbage.
> 
>   Käse   Käse    <d>Käse</d>
>   Москва Москва  <r>Москва</r>
> 
> I've tried to use an N introducer with the XML column, but that leads to
> errors.
> 
>   CAST( N? AS XML)      - Invalid column name 'n...@p3'. (SQL-42S22)
>   CAST( N ? AS XML)     - Incorrect syntax near '@P3'. (SQL-42000)
> 
> I'd appreciate your advice.

There is good news and some not so good news (unless you can build DBD::ODBC 
for your platform).

To fix the insert bind the parameter as SQL_WCHAR (-8).

However, the select cannot be fixed with the same trick as DBD::ODBC does not 
currently act on the TYPE attribute to bind_col. I've fixed that and can send 
you a new DBD::ODBC but you'll have to build it yourself (which is easy if you 
are UNIX or using Strawberry Perl and only slightly harder if you are using 
ActiveState on Windows assuming it is a recentish version).

By all means RT a request to default XML columns to unicode and that will mean 
you don't have to specify the TYPE in bind_param.

Below is my example which may also indicate another subtle error with length() 
but I've not had time to look into it yet.

use strict;
use warnings; no warnings 'uninitialized';
use utf8;
use DBI;
use bytes;

no bytes;
my $txt_de = 'Käse';
my $txt_ru = 'Москва';

binmode STDOUT, ':utf8';

my @dsn = qw/DBI:ODBC:xxx xx xx/;
my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1, ChopBlanks => 1);
my $dbh = DBI->connect( @dsn, \%opt );
$dbh->{LongReadLen} = 4000;
$dbh->{LongTruncOk} = 1; # Dies muß, ob logisch oder nicht.

my $sth_ins = $dbh->prepare(
  'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' );

foreach my $row ([$txt_de, $txt_de, "<d>$txt_de</d>"],
                 [$txt_ru, $txt_ru, "<r>$txt_ru</r>"]) {
    $sth_ins->bind_param(1, $row->[0]);
    $sth_ins->bind_param(2, $row->[1]); # defaults to SQL_WCHAR
    $sth_ins->bind_param(3, $row->[2], {TYPE => -8});
    $sth_ins->execute;
}
#$sth_ins->execute( $txt_de, $txt_de, "<d>$txt_de</d>" );
#$sth_ins->execute( $txt_ru, $txt_ru, "<r>$txt_ru</r>" );

my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' );
$sth_sel->execute;
# TYPE not required for column 1 really as DBD::ODBC knows it is unicode
$sth_sel->bind_col(1, \my $txt, {TYPE => -8});
$sth_sel->bind_col(2, \my $xml, {TYPE => -8});
#$sth_sel->bind_columns( \my( $txt, $xml ) );
my $i = 0;
while ( $sth_sel->fetch ) {
  printf "%3u %3u %3u %s [%s] [%s]\n",
      ++$i, length($txt), bytes::length($txt),
          (utf8::is_utf8($txt) ? ' utf8' : '!utf8'), $txt, $xml;
  # NOTE, if I don't reset $txt each iteration the length() call returns
  # the wrong answer - that might also be a bug.
  #$txt = '';
}
$dbh->disconnect;

Martin
-- 
Martin J. Evans
Easysoft Limited
http://www.easysoft.com

Reply via email to