On Mar 14, 2012, at 1:35 PM, Elizabeth Mattijsen wrote:
> Maybe some background first:
> 
> I'm working on code that does queries on large tables.  To be able to support 
> the growth of these tables, we need to shard the tables to multiple database 
> servers.  One way to shard this evenly, would be to shard per second of data: 
> even seconds go to one database, odd seconds go to another.
> 
> We have a lot of code that creates minutely / hourly / daily aggregations (in 
> Perl, the database just cannot do the kind of aggregations that we need) by 
> querying these tables.
> 
> When moving to sharded tables, I would like to keep the changes to the code 
> as minimal as possible.
> 
> By monkeypatching DBI::st, I've been able to reduce the code change to only 
> having to add another loop construct to the code, so something like:
> 
> =============================
> my %visitors;
> my $values;
> $visitors{ $values->[0] }= undef
>  while $values= $sth->fetchrow_arrayref;
> printf "Found %d unique visitors\n", scalar keys %visitors;
> =============================
> 
> Becomes:
> 
> =============================
> my %visitors;
> my $values;
> while ( $sth= $sth->next ) {
>    $visitors{ $values->[0] }= undef
>      while $values= $sth->fetchrow_arrayref;
> }
> printf "Found %d unique visitors\n", scalar keys %visitors;
> =============================
> 
> Then I found out about "swap_inner_handles" and realised that *if* I could 
> catch the depletion of the result set, I could do the $sth->next logic using 
> swap_inner_handles.  And make it completely invisible to the outside world.
> 
> 
> However, I've not been able to find a way to catch that state change.
> 
> I have tried:
> 
>  Stealing $sth->finish (on the assumption that that would be called when the 
> result set depletes)
> 
>  Stealing DBI::st::STORE (on the assumption that the tie interface would be 
> used to reset the Active attribute)
> 
> Neither of these work, unfortunately.
> 
> 
> So I wonder, would it make sense to add a statement handle attribute like 
> "last_record_seen" which would contain a code ref?  And have that code 
> execute with the statement handle as its only parameter whenever the result 
> set of a statement handle gets depleted, but before fetchrow_arrayref returns 
> undef?  If that would exist, then it would simply be a matter of putting the 
> swap_inner_handle logic in that code.
> 
> something like:
> 
> $sth->{last_record_seen}= sub {
>    my ($old)= @_;
>    $old->swap_inner_handle($new);  # wherever $new comes from, not important 
> here
> }

Actually, this callback would need to return whether or not to call the 
original fetch code again.  Something like:

$sth->{last_record_seen}= sub {
   my ($old)= @_;
   if ($new) {   # wherever $new comes from, not important here
       $old->swap_inner_handle($new);
       return 1;  # please try again
   }
   return 0; # we're really done, thank you
}



Liz

Reply via email to