Re: Fun with UTF-8 and Oracle

Peter J. Holzer Mon, 03 Sep 2007 10:21:24 -0700

On 2007-09-03 13:03:37 +0100, Tim Bunce wrote:
> Oh the joys of Oracle character set support.
> 
> See the dbd_rebind_ph() code in dbdimp.c. Especially from the line
> "csform = phs->csform;" onwards.
> 
> You can explicitly set the csform and csid using bind_param(..., { ...  })
> 
> Have fun!


I had :-). You gave me a push into the right direction, and I think I
found the problem:

   1375     csform = phs->csform;
   1376 
   1377     if (!csform && SvUTF8(phs->sv)) {
   1378         /* try to default csform to avoid translation through 
non-unicode */
   1379         if (CSFORM_IMPLIES_UTF8(SQLCS_NCHAR))           /* prefer NCHAR 
*/
   1380             csform = SQLCS_NCHAR;
   1381         else if (CSFORM_IMPLIES_UTF8(SQLCS_IMPLICIT))
   1382             csform = SQLCS_IMPLICIT;
   1383         /* else leave csform == 0 */
   1384         if (trace_level)
   1385             PerlIO_printf(DBILOGFP, "       rebinding %s with UTF8 
value %s", phs->name,
   1386                 (csform == SQLCS_NCHAR)    ? "so setting 
csform=SQLCS_IMPLICIT" :
   1387                 (csform == SQLCS_IMPLICIT) ? "so setting 
csform=SQLCS_NCHAR" :
   1388                     "but neither CHAR nor NCHAR are unicode\n");
   1389     }

The first thing to notice is that the trace message is backwards: It
prints setting csform=SQLCS_IMPLICIT if the csform is set to SQLCS_NCHAR
and vice versa. So that explains why I didn't notice anything strange in
the trace output. While it claimed to set csform=SQLCS_IMPLICIT, it was
actually setting it to SQLCS_NCHAR, and I guess that was what prevented
the use of the index on the varchar2 column. Sure enough, if I
explicitely set ora_csform to 1, it's fast. It is also fast, if I set
NLS_NCHAR to US7ASCII, so that it isn't used. 

Maybe the tests in line 1379 and 1381 should be reversed so that
SQLCS_IMPLICIT is preferred over SQLCS_NCHAR? I guess there was some
good reason to prefer SQLCS_NCHAR, but it seems to me that SQLCS_IMPLICIT is
usually what you would want (but then I'm probably biased from my
absolute failure to use nvarchar or nclob columns with Oracle 8.0 from
Java or Perl many moons ago).


> But do please write up what you find as a patch the the Oracle docs.
> It's entirely possible that the code is doing the wrong thing.

Like this?

Index: Oracle.pm
===================================================================
--- Oracle.pm   (revision 9909)
+++ Oracle.pm   (working copy)
@@ -1462,7 +1462,14 @@
 
 Specify the OCI_ATTR_CHARSET_FORM for the bind value. Valid values
 are SQLCS_IMPLICIT (1) and SQLCS_NCHAR (2). Both those constants can
-be imported from the DBD::Oracle module. Rarely needed.
+be imported from the DBD::Oracle module.
+This should be rarely needed, but at least some versions of Oracle
+(e.g., 10.2) don't use an index if this is set wrong, which can happen
+if both the character set and the national character set are UTF-8 and
+the placeholder value has the utf8 flag set. In this case you can either
+choose ora_csform to match your column or (if you don't have any NCHAR,
+NVARCHAR2 or NCLOB columns) you can set NLS_NCHAR to US7ASCII to prevent
+its use.
 
 =item ora_csid
 

        hp

-- 
   _  | Peter J. Holzer    | If I wanted to be "academically correct",
|_|_) | Sysadmin WSR       | I'd be programming in Java.
| |   | [EMAIL PROTECTED]      | I don't, and I'm not.
__/   | http://www.hjp.at/ |   -- Jesse Erlbaum on dbi-users

pgpNSensae0Jy.pgp
Description: PGP signature

Re: Fun with UTF-8 and Oracle

Reply via email to