RE: Oracle ncahr versus nvarchar2 and unicode data

Martin J. Evans Tue, 19 Jul 2005 06:32:26 -0700

I seem to have identified the difference between selecting from nchar and
nvarchar columns containing unicode data:


for nvarchar:
col  1: dbtype 1, scale 0, prec 512, nullok 1, name A
 : dbsize 512, char_used 1, char_size 256, csid 2000, csform 2, disize 512

for ncahr:
col  1: dbtype 96, scale 0, prec 512, nullok 1, name A
 : dbsize 512, char_used 1, char_size 256, csid 2000, csform 2, disize 512

and for nvarchar, DBD::Oracle makes the call:

calling OCIAttrSet OCI_ATTR_CHARSET_FORM with csform=2
 OCIAttrSet(83423a0,OCI_HTYPE_DEFINE,bfffed87,0,32,833c858)=SUCCESS

If I change oci8.c around line 1316 which was:

#ifdef OCI_ATTR_CHARSET_FORM
 if ( (fbh->dbtype == 1) ) { /*  && (fbh->csform == SQLCS_NCHAR) &&
CS_IS_UTF8(ncharsetid) ) { */
 /* ok... after doing what tim asked: setting SvUTF8 strictly based on csid
8bit Nchar test was broken
 and this currently effectively just sets Attrs to the values in fhb ignoring
ncharsetid altogether 
 probably wrong
 */

to:

#ifdef OCI_ATTR_CHARSET_FORM
        if ( (fbh->dbtype == 1) || (fbh->csform == SQLCS_NCHAR)) { 

I seem to get my data back as I expected.

I don't know much about DBD::Oracle so the comment in the code does
not help me that much but it looks as though the code used to be
fairly similar to how I changed it.

So, my question now becomes, should I be able to store unicode data
in an nchar (which I can with stock DBD::Oracle 1.16) and expect to be
able to get it back again as unicode data? Is DBD::Oracle working 
correctly or should I have done something else?

Martin

--
Martin J. Evans
Easysoft Ltd, UK
Development
 
On 19-Jul-2005 Martin J. Evans wrote:
> Hi,
> 
> I have some further information on my problem. The insertion of UTF-8 data
> into
> nchar seems to work like it does for nvarchar2; it appears it is the
> selecting
> from nchar fields that is not returning the data inserted. I have verified
> this
> by inserting the same data into nchar and nvarchar fields and reading it via
> Oracle's ODBC driver in MS Access - where it looks fine. But, retrieving it
> in
> Perl does not display the same results.
> 
> Martin
> --
> Martin J. Evans
> Easysoft Ltd, UK
> Development
> 
> 
> On 18-Jul-2005 Martin J. Evans wrote:
>> Hi,
>> 
>> Wondered if anyone can tell me if I've misunderstood something here. I am
>> able
>> to insert unicode data into nvarchar2 and get back what was inserted but
>> same
>> code with nchar column does not work.
>> 
>> OCI version is 10.1.0.2
>> Database is Oracle 10.2 on Linux
>> Database  CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16
>> (Unicode)
>> This is perl, v5.8.7 built for i686-linux
>> DBD::Oracle 1.16.
>> 
>>#!/disk2/martin/perl/bin/perl
>> use charnames ':full';
>> use DBI;
>> use Encode;
>> use DBD::Oracle qw( :ora_types ORA_OCI SQLCS_NCHAR );
>> use Devel::Peek;
>> 
>> if ( $] < 5.008) {
>>     print "Need at least Perl 5.8\n";
>>     exit 1;
>> }
>> binmode(STDOUT, ":utf8");
>> 
>> print "<html><head>";
>> print q|<meta http-equiv="Content-Type" content="text/html;charset=utf-8">|;
>> print "<title>test</title></head><body>\n";
>> 
>> $ENV{'ORACLE_HOME'} = '/home/oracle/product/10.1.0/db_2';
>> $ENV{'NLS_NCHAR'} = 'AL32UTF8';
>> my $dbh = DBI->connect('dbi:Oracle:tentwo.testing', 'xxx', 'yyy');
>> $dbh->{ChopBlanks} = 1;
>> print "OCI version is ", ORA_OCI, "\n";
>> print "ora_can_unicode = ", $dbh->ora_can_unicode, "\n";
>> if (!($dbh->ora_can_unicode & 1)) {
>>     print "Database does not support UNICODE\n";
>>     exit 1;
>> }
>> my $paramsH = $dbh->ora_nls_parameters();
>> printf "Database $ora_server_version CHAR set is %s (%s), NCHAR set is %s
>> (%s)\n",
>>     $paramsH->{NLS_CHARACTERSET}, 
>>     $dbh->ora_can_unicode & 2 ? "Unicode" : "Non-Unicode",
>>     $paramsH->{NLS_NCHAR_CHARACTERSET},
>>     $dbh->ora_can_unicode & 1 ? "Unicode" : "Non-Unicode";
>> eval {
>> $dbh->do('drop table martin');
>> };
>># NOTE nchar does not work!!!! (why?)
>> $dbh->do('create table martin (a nvarchar2(256))');
>> my $sql = "insert into martin values (?)";
>> my $x = "\N{EURO SIGN}\x{20ac}\N{LATIN CAPITAL LETTER A WITH ACUTE}";
>> $x .= chr(0x20ac);
>> print "is_utf8 returns ", utf8::is_utf8($x), "\n";
>> print "valid returns ", utf8::valid($x), "\n";
>> my $sth = $dbh->prepare($sql);
>> $sth->bind_param(1, $x, {ora_csform => SQLCS_NCHAR});
>> $sth->execute();
>> $sql = "select a from martin";
>> $sth = $dbh->prepare($sql);
>> $sth->execute;
>> while(my @row = $sth->fetchrow_array) {
>>     print "valid on returned string returns ", utf8::valid($row[0]), "\n";
>>     #$_ = decode("utf8", $_) foreach @row;
>>     print $row[0], "\n";
>>     print unpack("H*", $row[0]), "\n";
>>     print Dump(@row[0]);
>> }
>> $sth->finish;
>> $dbh->disconnect();
>> 
>> Output is:
>> 
>> <html><head><meta http-equiv="Content-Type"
>> content="text/html;charset=utf-8"><title>test</title></head><body>
>> OCI version is 10.1.0.2
>> ora_can_unicode = 1
>> Database  CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16
>> (Unicode)
>> is_utf8 returns 1
>> valid returns 1
>> valid on returned string returns 1
>> <utf8 data here - snipped from email>
>> e282ace282acc381e282ac
>> SV = PV(0x82a3efc) at 0x82e5b48
>>   REFCNT = 1
>>   FLAGS = (POK,pPOK,UTF8)
>>   PV = 0x8361bb0 "\342\202\254\342\202\254\303\201\342\202\254"\0 [UTF8
>> "\x{20ac}\x{20ac}\x{c1}\x{20ac}"]
>>   CUR = 11
>>   LEN = 12
>> </body></html>
>> 
>> UTF-8 data output with Devel::Peek Dump looks fine.
>>   
>> A simple change to create the table with nchar instead of nvarchar2
>> outputs:
>> 
>> <html><head><meta http-equiv="Content-Type"
>> content="text/html;charset=utf-8"><title>test</title></head><body>
>> OCI version is 10.1.0.2
>> ora_can_unicode = 1
>> Database  CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16
>> (Unicode)
>> is_utf8 returns 1
>> valid returns 1
>> valid on returned string returns 1
>> <utf8 data here snipped from email>
>> 3f3f413f
>> SV = PV(0x82a3efc) at 0x82e5c3c
>>   REFCNT = 1
>>   FLAGS = (POK,pPOK,UTF8)
>>   PV = 0x8361cc0 "??A?"\0 [UTF8 "??A?"]
>>   CUR = 4
>>   LEN = 5
>> </body></html>
>> 
>> Devel::Peek Dump output looks nothing like what I expected.
>> 
>> Also, do I have to use {ora_csform => SQLCS_NCHAR}? Does not seem to work in
>> either case if I don't.
>> 
>> And one final thing. If I change $x containing the UTF-8 data to insert and
>> run
>> $sth->execute again it inserts the original data again and not the changed
>> $x.
>> I've not confirmed this yet with other drivers, but I am /fairly/ sure with
>> DBD::ODBC you don't have to call bind_param again if the bound data is
>> changed.
>> 
>> Thanks.
>> 
>> Martin
>> --
>> Martin J. Evans
>> Easysoft Ltd, UK
>> Development

RE: Oracle ncahr versus nvarchar2 and unicode data

Reply via email to