While investigating a performance problem I found a curious behaviour. I have an oracle database (10.2.0.3) and matching client, DBI 1.50 and DBD::Oracle 1.19. Database and client are set to use the AL32UTF8 charset.
Now I do a simple query on one of the tables: select id, name from datasets ds where name = ? The table has about 300k rows and an index on the name column. The name I'm searching for contains only ASCII characters, but depending on the source, the string may have the utf8 flag set. So my testcase currently looks like this: 1 #!/usr/bin/perl 2 use warnings; 3 use strict; 4 use utf8; 5 use DBI; 6 use Time::HiRes qw(time); 7 my $dbh = db_connect(); 8 datasets_by_name($dbh, 'CN-03021200-import-1000 kg-CMX-rs4'); 9 datasets_by_name($dbh, 'CN-03021200-import-1000 ECU-CMX-rs4'); 10 my $s ='CN-03021200-import-1000 ECU-CMX-rs5'; 11 utf8::upgrade $s; 12 datasets_by_name($dbh, $s); 13 datasets_by_name($dbh, 'CN-03021200-import-1000 ECU-CMX-rs6'); 14 datasets_by_name($dbh, 'CN-03021200-import-1000 kg-CMX-rs6'); 15 sub read_cred { 16 my ($fn) = @_; 17 open(FN, "<$fn") or die "cannot open $fn: $!"; 18 my $line = <FN>; 19 close(FN); 20 my @cred = split(/[\s\n]+/, $line); 21 return @cred; 22 } 23 sub db_connect { 24 my $dbi_credential_file = $ENV{WZRP_CONN}; 25 my ($data_source, $username, $auth) = read_cred($dbi_credential_file); 26 $dbh = DBI->connect($data_source, $username, $auth, 27 { AutoCommit => 0, 28 PrintError => 0, 29 RaiseError => 1 30 } 31 ); 32 $dbh->{FetchHashKeyName} = 'NAME_lc'; 33 return $dbh; 34 } 35 my $sth; 36 sub datasets_by_name { 37 my ($dbh, $name) = @_; 38 my $t0 = time; 39 unless ($ENV{PREPARE_ONCE} && $sth) { 40 $sth = $dbh->prepare("select id, name from datasets ds where name = ?"); 41 } 42 my $r = $dbh->selectall_hashref($sth, 'id', {}, $name); 43 my $t1 = time; 44 print $t1-$t0, "\n"; 45 } The function datasets_by_name invokes the query above. If it prepares the query on each request, the timings look like this: 0.00154304504394531 0.00111699104309082 0.27887487411499 0.00116086006164551 0.00110697746276855 I.e., searching for a byte string takes about 1 millisecond and searching for a utf8 string takes about 280 milliseconds! Oracle enterprise manager tells me that in the slow case the index is ignored and a full table scan is performed. But wait, it gets even better. If the statement handle for the query is cached and reused, the timings look like this: 0.00197005271911621 0.000653982162475586 0.280965089797974 0.2483971118927 0.250221967697144 I.e., the information that the index can't be used (which I don't understand in the first place) seems to be "sticky". hp -- _ | Peter J. Holzer | If I wanted to be "academically correct", |_|_) | Sysadmin WSR | I'd be programming in Java. | | | [EMAIL PROTECTED] | I don't, and I'm not. __/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users
pgpZ7srUVyTaL.pgp
Description: PGP signature