I've snipped my original email and Tim's response and started again based on Tim's comments in the email and on IRC. So

o this concentrates on input to the database and ignores output.
o only looks so far at inserting into varchars (and not nvarchars which I currently believe works but we'll come back to that). o includes examples of different inputs, lengths in perl and what you get back. o test code and all discussion is windows only for now as ODBC drivers on Unix can work in very different ways. Also, test code run to MS SQL Server which in my experience is one of the best ODBC drivers for Windows. o you cannot correctly translate chr data from a client to a server using the SQL Server ODBC driver if the client code page differs from the server code page - see http://support.microsoft.com/kb/234748. o MS quite clearly state "If your server must store data from multiple code pages, the supported solution is to store the data in Unicode columns (NCHAR/NVARCHAR/NTEXT)." see http://support.microsoft.com/kb/234748 again.
o test code included at end of mail

A few things you should know:

1. D:O does not know what codepage you are in and there is nothing in ODBC which allows it to find that out or even say it matters. 2. I believe the TDS protocol (what MS SQL Server uses) allows a client code page to be sent, however I believe that is irrelevant now as the SQL Server driver converts your client code page chrs into unicode before sending them to SQL Server. 3. You cannot expect to get back chrs you inserted into char/varchar columns if you did not turn on AutoTranslate in the driver (which defaults to on). 4. D:O converts Perl unicode scalars to unicode on Windows using MultiByteToWideChar and setting CP_UTF8.

ok, so diving into the example, here is hopefully fairly self explanatory output from the test code (code at end of email). I've interleaved my comments starting with #.

Current active console code page: 1252
DBD::ODBC build for unicode:1
Output connstr: DSN=asus2;UID=sa;PWD=easysoft;APP=Strawberry Perl (64-bit);WSID=ASUS2;Network=DBMSSOCN;Address=ASUS2\SQLEXPRESS,1433

Inserting a unicode euro, utf8 flag on:
  input string: €
data_string_desc of input string: UTF8 on, non-ASCII, 1 characters 3 bytes
  ords of input string: 20ac,
  bytes of input string: e2,82,ac,
  database length: 3
data_string_desc of output string: UTF8 on, non-ASCII, 3 characters 7 bytes
  ords of output string:e2,201a,ac,
# what happened here is SQLDecribeParam described the parameter as SQL_CHAR and that is what D:O bound it as. The driver converted 82 in code page 1252 to 201a (a magic quote) and the database sees this as 3 chrs # the probably mistake here is that D:O should have looked at the perl data, seen it was unicode and not bound it as SQL_CHAR (but SQL_WCHAR) even though SQLDescribeParam said it was

Inserting a UTF-8 encoded unicode euro, utf8 flag off:
"\x{0082}" does not map to cp1252 at test_code.pl line 36.
# the above resulted from a print to stdout in windows-1252
# you'd expect that since we were outputting bytes to the terminal
  input string: â\x{0082}¬
data_string_desc of input string: UTF8 off, non-ASCII, 3 characters 3 bytes
  ords of input string: e2,82,ac,
  bytes of input string: e2,82,ac,
  database length: 3
data_string_desc of output string: UTF8 on, non-ASCII, 3 characters 7 bytes
  ords of output string:e2,201a,ac,
# same thing as first case happened here except D:O couldn't have done anything else as Perl said the scalar was not unicode and SQL Server said the parameter was SQL_CHAR

Inserting a unicode euro, utf8 flag on, forced SQL_WVARCHAR:
  input string: €
data_string_desc of input string: UTF8 on, non-ASCII, 1 characters 3 bytes
  ords of input string: 20ac,
  bytes of input string: e2,82,ac,
  database length: 1
data_string_desc of output string: UTF8 on, non-ASCII, 1 characters 3 bytes
  ords of output string:20ac,
# the above is arguably what should have happened in the first test case i.e., D:O should have bound as SQL_WVARCHAR because the perl data was unicode. I included this one to show you what happens if you do it right.

Inserting a unicode U+187 which is not in the current code page:
"\x{0187}" does not map to cp1252 at test_code.pl line 36.
# you'd expect this warning here on printing to stdout
  input string: \x{0187}
data_string_desc of input string: UTF8 on, non-ASCII, 1 characters 2 bytes
  ords of input string: 187,
  bytes of input string: c6,87,
  database length: 2
data_string_desc of output string: UTF8 on, non-ASCII, 2 characters 5 bytes
  ords of output string:c6,2021,
# here again, down to D:O binding as SQL_CHAR, SQL Server interpreted this as 2 chrs in the client code page, and converted them to chr c6 and a double dagger

Inserting a unicode U+187 which is not in the current code page with forced binding:
"\x{0187}" does not map to cp1252 at test_code.pl line 36.
  input string: \x{0187}
data_string_desc of input string: UTF8 on, non-ASCII, 1 characters 2 bytes
  ords of input string: 187,
  bytes of input string: c6,87,
  database length: 1
  data_string_desc of output string: UTF8 off, ASCII, 1 characters 1 bytes
  ords of output string:3f,
# character U+187 won't fit in codepage 1252 so sql server converted it to a ?

So, I'm thinking the problem above is D:O ignores utf8 flag on parameters when they are bound and uses whatever SQLDescribeParam says instead (SQL_CHAR in case of varchar columns). If it didn't ignore the utf8 flag here, it would have to rebind on every execute (which it may already do, I didn't check).

test code:

=====
use 5.014;
use strict;
use warnings;
use DBI qw(:utils :sql_types);
use Encode qw(encode is_utf8);
use Win32::API;

binmode STDOUT, ":encoding(cp1252)";

sub show_it {
    my $h = shift;
    my $r = $h->selectrow_arrayref(q/select len(a), a from test/);
    say "  database length: ", $r->[0];
    say "  data_string_desc of output string: ", data_string_desc($r->[1]);
    print "  ords of output string:";
    foreach my $s(split(//, $r->[1])) {
        print sprintf("%x", ord($s)), ",";
    }
    print "\n";
    $h->do(q/delete from test/);
}

sub execute {
    my ($s, $string) = @_;

    my $bytes;
    if (is_utf8($string)) {
        $bytes = encode("UTF-8", $string);
    } else {
        $bytes = $string;
    }
    say "  input string: $string";
    say "  data_string_desc of input string: ", data_string_desc($string);
    print "  ords of input string: ";
    foreach my $s(split(//, $string)) {
        print sprintf("%x,", ord($s));
    }
    print "\n";

    print "  bytes of input string: ";
    foreach my $s(split(//, $bytes)) {
        print sprintf("%x,", ord($s));
    }
    print "\n";

    $s->execute($string);
}

sub set_codepage {
    # get active codepage and ensure it is cp1252
# http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using
    Win32::API::More->Import("kernel32", "UINT GetConsoleOutputCP()");
    my $cp = GetConsoleOutputCP();
    print "Current active console code page: $cp\n";
    if ($cp != 1252) {
        print "Please change to codepage 1252 - run chcp 1252\n";
die "Incompatible active codepage - please change to codepage 1252 by running chcp 1252\n";
    }
}

set_codepage();

my $h = DBI->connect();
say "DBD::ODBC build for unicode:", $h->{odbc_has_unicode};
say "Output connstr: ", $h->{odbc_out_connect_string};

my $s;
my $sql = q/insert into test (a) values(?)/;

eval {$h->do(q/drop table test/)};
$h->do(q/create table test (a varchar(100) collate Latin1_General_CI_AS)/);

# a simple unicode string
my $euro = "\x{20ac}";
say "Inserting a unicode euro, utf8 flag on:";
$s = $h->prepare($sql); # redo to ensure no sticky params
execute($s, $euro);
show_it($h);

# a simple unicode string first encoded in UTF-8
my $enc = encode("UTF-8", $euro);
say "Inserting a UTF-8 encoded unicode euro, utf8 flag off:";
$s = $h->prepare($sql); # redo to ensure no sticky params
execute($s, $enc);
show_it($h);

# a simple unicode string forced to be sent as SQL_WVARCHAR
say "Inserting a unicode euro, utf8 flag on, forced SQL_WVARCHAR:";
$s = $h->prepare($sql); # redo to ensure no sticky params
$s->bind_param(1, undef, {TYPE => SQL_WVARCHAR});
execute($s, $euro);
show_it($h);

# a unicode string containing a character that is not in the column codepage
my $question = "\x{187}";
say "Inserting a unicode U+187 which is not in the current code page:";
$s = $h->prepare($sql); # redo to ensure no sticky params
execute($s, $question);
show_it($h);

# a unicode string containing a character that is not in the column codepage but forced binding
my $question = "\x{187}";
say "Inserting a unicode U+187 which is not in the current code page with forced binding:";
$s = $h->prepare($sql); # redo to ensure no sticky params
$s->bind_param(1, undef, {TYPE => SQL_WVARCHAR});
execute($s, $question);
show_it($h);

$h->disconnect;
=====

Martin
--
Martin J. Evans
Wetherby, UK

Reply via email to