Re: Fun with UTF-8 and Oracle

Tim Bunce Tue, 04 Sep 2007 07:11:03 -0700

On Mon, Sep 03, 2007 at 07:20:42PM +0200, Peter J. Holzer wrote:
> On 2007-09-03 13:03:37 +0100, Tim Bunce wrote:
> > Oh the joys of Oracle character set support.
> > 
> > See the dbd_rebind_ph() code in dbdimp.c. Especially from the line
> > "csform = phs->csform;" onwards.
> > 
> > You can explicitly set the csform and csid using bind_param(..., { ...  })
> > 
> > Have fun!
> 
> I had :-). You gave me a push into the right direction, and I think I
> found the problem:
> 
>    1375     csform = phs->csform;
>    1376 
>    1377     if (!csform && SvUTF8(phs->sv)) {
>    1378         /* try to default csform to avoid translation through 
> non-unicode */
>    1379         if (CSFORM_IMPLIES_UTF8(SQLCS_NCHAR))           /* prefer 
> NCHAR */
>    1380             csform = SQLCS_NCHAR;
>    1381         else if (CSFORM_IMPLIES_UTF8(SQLCS_IMPLICIT))
>    1382             csform = SQLCS_IMPLICIT;
>    1383         /* else leave csform == 0 */
>    1384         if (trace_level)
>    1385             PerlIO_printf(DBILOGFP, "       rebinding %s with UTF8 
> value %s", phs->name,
>    1386                 (csform == SQLCS_NCHAR)    ? "so setting 
> csform=SQLCS_IMPLICIT" :
>    1387                 (csform == SQLCS_IMPLICIT) ? "so setting 
> csform=SQLCS_NCHAR" :
>    1388                     "but neither CHAR nor NCHAR are unicode\n");
>    1389     }
> 
> The first thing to notice is that the trace message is backwards: It
> prints setting csform=SQLCS_IMPLICIT if the csform is set to SQLCS_NCHAR
> and vice versa. So that explains why I didn't notice anything strange in
> the trace output. While it claimed to set csform=SQLCS_IMPLICIT, it was
> actually setting it to SQLCS_NCHAR, and I guess that was what prevented
> the use of the index on the varchar2 column. Sure enough, if I
> explicitely set ora_csform to 1, it's fast. It is also fast, if I set
> NLS_NCHAR to US7ASCII, so that it isn't used. 
> 
> Maybe the tests in line 1379 and 1381 should be reversed so that
> SQLCS_IMPLICIT is preferred over SQLCS_NCHAR?


Sounds very plausible.

> I guess there was some good reason to prefer SQLCS_NCHAR, but it seems
> to me that SQLCS_IMPLICIT is usually what you would want

It was probably just the way the code evolved.

> (but then I'm probably biased from my absolute failure to use nvarchar
> or nclob columns with Oracle 8.0 from Java or Perl many moons ago).
>
> > But do please write up what you find as a patch the the Oracle docs.
> > It's entirely possible that the code is doing the wrong thing.
> 
> Like this?

That's great, but why stop there? Now you seem to have identified the
problem, how about patching the code instead ;-)

Tim.

> Index: Oracle.pm
> ===================================================================
> --- Oracle.pm   (revision 9909)
> +++ Oracle.pm   (working copy)
> @@ -1462,7 +1462,14 @@
>  
>  Specify the OCI_ATTR_CHARSET_FORM for the bind value. Valid values
>  are SQLCS_IMPLICIT (1) and SQLCS_NCHAR (2). Both those constants can
> -be imported from the DBD::Oracle module. Rarely needed.
> +be imported from the DBD::Oracle module.
> +This should be rarely needed, but at least some versions of Oracle
> +(e.g., 10.2) don't use an index if this is set wrong, which can happen
> +if both the character set and the national character set are UTF-8 and
> +the placeholder value has the utf8 flag set. In this case you can either
> +choose ora_csform to match your column or (if you don't have any NCHAR,
> +NVARCHAR2 or NCLOB columns) you can set NLS_NCHAR to US7ASCII to prevent
> +its use.
>  
>  =item ora_csid
>  
> 
>       hp
> 
> -- 
>    _  | Peter J. Holzer    | If I wanted to be "academically correct",
> |_|_) | Sysadmin WSR       | I'd be programming in Java.
> | |   | [EMAIL PROTECTED]      | I don't, and I'm not.
> __/   | http://www.hjp.at/ |   -- Jesse Erlbaum on dbi-users

Re: Fun with UTF-8 and Oracle

Reply via email to