Fun with UTF-8 and Oracle

Peter J. Holzer Thu, 23 Aug 2007 07:20:40 -0700

While investigating a performance problem I found a curious behaviour.

I have an oracle database (10.2.0.3) and matching client, DBI 1.50 and
DBD::Oracle 1.19. Database and client are set to use the AL32UTF8
charset.


Now I do a simple query on one of the tables:

select id, name from datasets ds where name = ?

The table has about 300k rows and an index on the name column. 

The name I'm searching for contains only ASCII characters, but depending
on the source, the string may have the utf8 flag set. 

So my testcase currently looks like this:

     1  #!/usr/bin/perl
       
     2  use warnings;
     3  use strict;
     4  use utf8;
       
     5  use DBI;
     6  use Time::HiRes qw(time);
       
       
     7  my $dbh = db_connect();
       
     8  datasets_by_name($dbh, 'CN-03021200-import-1000 kg-CMX-rs4');
       
     9  datasets_by_name($dbh, 'CN-03021200-import-1000 ECU-CMX-rs4');
       
    10  my $s ='CN-03021200-import-1000 ECU-CMX-rs5';
    11  utf8::upgrade $s;
    12  datasets_by_name($dbh, $s);
       
    13  datasets_by_name($dbh, 'CN-03021200-import-1000 ECU-CMX-rs6');
       
    14  datasets_by_name($dbh, 'CN-03021200-import-1000 kg-CMX-rs6');
       
       
    15  sub read_cred {
    16      my ($fn) = @_;
       
    17      open(FN, "<$fn") or die "cannot open $fn: $!";
    18      my $line = <FN>; 
    19      close(FN);
    20      my @cred = split(/[\s\n]+/, $line); 
    21      return @cred;
    22  }
       
    23  sub db_connect {
    24      my $dbi_credential_file = $ENV{WZRP_CONN};
    25      my ($data_source, $username, $auth) = 
read_cred($dbi_credential_file);
    26      $dbh = DBI->connect($data_source, $username, $auth,
    27                                 { AutoCommit => 0,
    28                                   PrintError => 0,
    29                                   RaiseError => 1
    30                                 }
    31                              );
    32      $dbh->{FetchHashKeyName} = 'NAME_lc';
    33      return $dbh;
    34  }
       
    35  my $sth;
       
    36  sub datasets_by_name {
    37      my ($dbh, $name) = @_;
    38      my $t0 = time;
    39      unless ($ENV{PREPARE_ONCE} && $sth) {
    40          $sth = $dbh->prepare("select id, name from datasets ds where 
name = ?");
    41      }
    42      my $r = $dbh->selectall_hashref($sth, 'id', {}, $name);
    43      my $t1 = time;
    44      print $t1-$t0, "\n";
    45  }


The function datasets_by_name invokes the query above. If it prepares
the query on each request, the timings look like this:

0.00154304504394531
0.00111699104309082
0.27887487411499
0.00116086006164551
0.00110697746276855

I.e., searching for a byte string takes about 1 millisecond and
searching for a utf8 string takes about 280 milliseconds! Oracle
enterprise manager tells me that in the slow case the index is ignored
and a full table scan is performed. 

But wait, it gets even better. If the statement handle for the query is
cached and reused, the timings look like this:

0.00197005271911621
0.000653982162475586
0.280965089797974
0.2483971118927
0.250221967697144

I.e., the information that the index can't be used (which I don't
understand in the first place) seems to be "sticky".

        hp


-- 
   _  | Peter J. Holzer    | If I wanted to be "academically correct",
|_|_) | Sysadmin WSR       | I'd be programming in Java.
| |   | [EMAIL PROTECTED]      | I don't, and I'm not.
__/   | http://www.hjp.at/ |   -- Jesse Erlbaum on dbi-users

pgpZ7srUVyTaL.pgp
Description: PGP signature

Fun with UTF-8 and Oracle

Reply via email to