I seem to have identified the difference between selecting from nchar and nvarchar columns containing unicode data:
for nvarchar: col 1: dbtype 1, scale 0, prec 512, nullok 1, name A : dbsize 512, char_used 1, char_size 256, csid 2000, csform 2, disize 512 for ncahr: col 1: dbtype 96, scale 0, prec 512, nullok 1, name A : dbsize 512, char_used 1, char_size 256, csid 2000, csform 2, disize 512 and for nvarchar, DBD::Oracle makes the call: calling OCIAttrSet OCI_ATTR_CHARSET_FORM with csform=2 OCIAttrSet(83423a0,OCI_HTYPE_DEFINE,bfffed87,0,32,833c858)=SUCCESS If I change oci8.c around line 1316 which was: #ifdef OCI_ATTR_CHARSET_FORM if ( (fbh->dbtype == 1) ) { /* && (fbh->csform == SQLCS_NCHAR) && CS_IS_UTF8(ncharsetid) ) { */ /* ok... after doing what tim asked: setting SvUTF8 strictly based on csid 8bit Nchar test was broken and this currently effectively just sets Attrs to the values in fhb ignoring ncharsetid altogether probably wrong */ to: #ifdef OCI_ATTR_CHARSET_FORM if ( (fbh->dbtype == 1) || (fbh->csform == SQLCS_NCHAR)) { I seem to get my data back as I expected. I don't know much about DBD::Oracle so the comment in the code does not help me that much but it looks as though the code used to be fairly similar to how I changed it. So, my question now becomes, should I be able to store unicode data in an nchar (which I can with stock DBD::Oracle 1.16) and expect to be able to get it back again as unicode data? Is DBD::Oracle working correctly or should I have done something else? Martin -- Martin J. Evans Easysoft Ltd, UK Development On 19-Jul-2005 Martin J. Evans wrote: > Hi, > > I have some further information on my problem. The insertion of UTF-8 data > into > nchar seems to work like it does for nvarchar2; it appears it is the > selecting > from nchar fields that is not returning the data inserted. I have verified > this > by inserting the same data into nchar and nvarchar fields and reading it via > Oracle's ODBC driver in MS Access - where it looks fine. But, retrieving it > in > Perl does not display the same results. > > Martin > -- > Martin J. Evans > Easysoft Ltd, UK > Development > > > On 18-Jul-2005 Martin J. Evans wrote: >> Hi, >> >> Wondered if anyone can tell me if I've misunderstood something here. I am >> able >> to insert unicode data into nvarchar2 and get back what was inserted but >> same >> code with nchar column does not work. >> >> OCI version is 10.1.0.2 >> Database is Oracle 10.2 on Linux >> Database CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16 >> (Unicode) >> This is perl, v5.8.7 built for i686-linux >> DBD::Oracle 1.16. >> >>#!/disk2/martin/perl/bin/perl >> use charnames ':full'; >> use DBI; >> use Encode; >> use DBD::Oracle qw( :ora_types ORA_OCI SQLCS_NCHAR ); >> use Devel::Peek; >> >> if ( $] < 5.008) { >> print "Need at least Perl 5.8\n"; >> exit 1; >> } >> binmode(STDOUT, ":utf8"); >> >> print "<html><head>"; >> print q|<meta http-equiv="Content-Type" content="text/html;charset=utf-8">|; >> print "<title>test</title></head><body>\n"; >> >> $ENV{'ORACLE_HOME'} = '/home/oracle/product/10.1.0/db_2'; >> $ENV{'NLS_NCHAR'} = 'AL32UTF8'; >> my $dbh = DBI->connect('dbi:Oracle:tentwo.testing', 'xxx', 'yyy'); >> $dbh->{ChopBlanks} = 1; >> print "OCI version is ", ORA_OCI, "\n"; >> print "ora_can_unicode = ", $dbh->ora_can_unicode, "\n"; >> if (!($dbh->ora_can_unicode & 1)) { >> print "Database does not support UNICODE\n"; >> exit 1; >> } >> my $paramsH = $dbh->ora_nls_parameters(); >> printf "Database $ora_server_version CHAR set is %s (%s), NCHAR set is %s >> (%s)\n", >> $paramsH->{NLS_CHARACTERSET}, >> $dbh->ora_can_unicode & 2 ? "Unicode" : "Non-Unicode", >> $paramsH->{NLS_NCHAR_CHARACTERSET}, >> $dbh->ora_can_unicode & 1 ? "Unicode" : "Non-Unicode"; >> eval { >> $dbh->do('drop table martin'); >> }; >># NOTE nchar does not work!!!! (why?) >> $dbh->do('create table martin (a nvarchar2(256))'); >> my $sql = "insert into martin values (?)"; >> my $x = "\N{EURO SIGN}\x{20ac}\N{LATIN CAPITAL LETTER A WITH ACUTE}"; >> $x .= chr(0x20ac); >> print "is_utf8 returns ", utf8::is_utf8($x), "\n"; >> print "valid returns ", utf8::valid($x), "\n"; >> my $sth = $dbh->prepare($sql); >> $sth->bind_param(1, $x, {ora_csform => SQLCS_NCHAR}); >> $sth->execute(); >> $sql = "select a from martin"; >> $sth = $dbh->prepare($sql); >> $sth->execute; >> while(my @row = $sth->fetchrow_array) { >> print "valid on returned string returns ", utf8::valid($row[0]), "\n"; >> #$_ = decode("utf8", $_) foreach @row; >> print $row[0], "\n"; >> print unpack("H*", $row[0]), "\n"; >> print Dump(@row[0]); >> } >> $sth->finish; >> $dbh->disconnect(); >> >> Output is: >> >> <html><head><meta http-equiv="Content-Type" >> content="text/html;charset=utf-8"><title>test</title></head><body> >> OCI version is 10.1.0.2 >> ora_can_unicode = 1 >> Database CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16 >> (Unicode) >> is_utf8 returns 1 >> valid returns 1 >> valid on returned string returns 1 >> <utf8 data here - snipped from email> >> e282ace282acc381e282ac >> SV = PV(0x82a3efc) at 0x82e5b48 >> REFCNT = 1 >> FLAGS = (POK,pPOK,UTF8) >> PV = 0x8361bb0 "\342\202\254\342\202\254\303\201\342\202\254"\0 [UTF8 >> "\x{20ac}\x{20ac}\x{c1}\x{20ac}"] >> CUR = 11 >> LEN = 12 >> </body></html> >> >> UTF-8 data output with Devel::Peek Dump looks fine. >> >> A simple change to create the table with nchar instead of nvarchar2 >> outputs: >> >> <html><head><meta http-equiv="Content-Type" >> content="text/html;charset=utf-8"><title>test</title></head><body> >> OCI version is 10.1.0.2 >> ora_can_unicode = 1 >> Database CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16 >> (Unicode) >> is_utf8 returns 1 >> valid returns 1 >> valid on returned string returns 1 >> <utf8 data here snipped from email> >> 3f3f413f >> SV = PV(0x82a3efc) at 0x82e5c3c >> REFCNT = 1 >> FLAGS = (POK,pPOK,UTF8) >> PV = 0x8361cc0 "??A?"\0 [UTF8 "??A?"] >> CUR = 4 >> LEN = 5 >> </body></html> >> >> Devel::Peek Dump output looks nothing like what I expected. >> >> Also, do I have to use {ora_csform => SQLCS_NCHAR}? Does not seem to work in >> either case if I don't. >> >> And one final thing. If I change $x containing the UTF-8 data to insert and >> run >> $sth->execute again it inserts the original data again and not the changed >> $x. >> I've not confirmed this yet with other drivers, but I am /fairly/ sure with >> DBD::ODBC you don't have to call bind_param again if the bound data is >> changed. >> >> Thanks. >> >> Martin >> -- >> Martin J. Evans >> Easysoft Ltd, UK >> Development