Re: Fun with UTF-8 and Oracle

mark warren bracher Mon, 12 Nov 2007 08:38:20 -0800

Peter J. Holzer wrote:

On 2007-09-03 13:03:37 +0100, Tim Bunce wrote:

Oh the joys of Oracle character set support.


See the dbd_rebind_ph() code in dbdimp.c. Especially from the line
"csform = phs->csform;" onwards.

You can explicitly set the csform and csid using bind_param(..., { ...  })

Have fun!


I had :-). You gave me a push into the right direction, and I think I
found the problem:

   1375     csform = phs->csform;

13761377 if (!csform && SvUTF8(phs->sv)) {

   1378         /* try to default csform to avoid translation through 
non-unicode */
   1379         if (CSFORM_IMPLIES_UTF8(SQLCS_NCHAR))           /* prefer NCHAR 
*/
   1380             csform = SQLCS_NCHAR;
   1381         else if (CSFORM_IMPLIES_UTF8(SQLCS_IMPLICIT))
   1382             csform = SQLCS_IMPLICIT;
   1383         /* else leave csform == 0 */
   1384         if (trace_level)
   1385             PerlIO_printf(DBILOGFP, "       rebinding %s with UTF8 value 
%s", phs->name,
   1386                 (csform == SQLCS_NCHAR)    ? "so setting 
csform=SQLCS_IMPLICIT" :
   1387                 (csform == SQLCS_IMPLICIT) ? "so setting 
csform=SQLCS_NCHAR" :
   1388                     "but neither CHAR nor NCHAR are unicode\n");
   1389     }

The first thing to notice is that the trace message is backwards: It
prints setting csform=SQLCS_IMPLICIT if the csform is set to SQLCS_NCHAR
and vice versa. So that explains why I didn't notice anything strange in
the trace output. While it claimed to set csform=SQLCS_IMPLICIT, it was
actually setting it to SQLCS_NCHAR, and I guess that was what prevented
the use of the index on the varchar2 column. Sure enough, if I
explicitely set ora_csform to 1, it's fast. It is also fast, if I set

NLS_NCHAR to US7ASCII, so that it isn't used.

Maybe the tests in line 1379 and 1381 should be reversed so that
SQLCS_IMPLICIT is preferred over SQLCS_NCHAR? I guess there was some
good reason to prefer SQLCS_NCHAR, but it seems to me that SQLCS_IMPLICIT is
usually what you would want (but then I'm probably biased from my
absolute failure to use nvarchar or nclob columns with Oracle 8.0 from
Java or Perl many moons ago).

stumbled on this thread while debugging a problem with full table scansbinding utf8 strings into oracle queries...


it seems the degraded performance is "expected"

http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96529/ch6.htm#1008756

when you used csform SQLCS_NCHAR and the database column is CHAR,VARCHAR2 or CLOB. same is true of the reverse, using SQLCS_IMPLICITwhen the column is NCHAR, NVARCHAR2, NCLOB.

perhaps we need to rework CSFORM_IMPLIES_UTF8 so it doesn't return truefor csform SQLCS_NCHAR if the columns are CHAR, VARCHAR2 or CLOB?


- mark

Re: Fun with UTF-8 and Oracle

Reply via email to