On Mar 14, 2012, at 1:35 PM, Elizabeth Mattijsen wrote: > Maybe some background first: > > I'm working on code that does queries on large tables. To be able to support > the growth of these tables, we need to shard the tables to multiple database > servers. One way to shard this evenly, would be to shard per second of data: > even seconds go to one database, odd seconds go to another. > > We have a lot of code that creates minutely / hourly / daily aggregations (in > Perl, the database just cannot do the kind of aggregations that we need) by > querying these tables. > > When moving to sharded tables, I would like to keep the changes to the code > as minimal as possible. > > By monkeypatching DBI::st, I've been able to reduce the code change to only > having to add another loop construct to the code, so something like: > > ============================= > my %visitors; > my $values; > $visitors{ $values->[0] }= undef > while $values= $sth->fetchrow_arrayref; > printf "Found %d unique visitors\n", scalar keys %visitors; > ============================= > > Becomes: > > ============================= > my %visitors; > my $values; > while ( $sth= $sth->next ) { > $visitors{ $values->[0] }= undef > while $values= $sth->fetchrow_arrayref; > } > printf "Found %d unique visitors\n", scalar keys %visitors; > ============================= > > Then I found out about "swap_inner_handles" and realised that *if* I could > catch the depletion of the result set, I could do the $sth->next logic using > swap_inner_handles. And make it completely invisible to the outside world. > > > However, I've not been able to find a way to catch that state change. > > I have tried: > > Stealing $sth->finish (on the assumption that that would be called when the > result set depletes) > > Stealing DBI::st::STORE (on the assumption that the tie interface would be > used to reset the Active attribute) > > Neither of these work, unfortunately. > > > So I wonder, would it make sense to add a statement handle attribute like > "last_record_seen" which would contain a code ref? And have that code > execute with the statement handle as its only parameter whenever the result > set of a statement handle gets depleted, but before fetchrow_arrayref returns > undef? If that would exist, then it would simply be a matter of putting the > swap_inner_handle logic in that code. > > something like: > > $sth->{last_record_seen}= sub { > my ($old)= @_; > $old->swap_inner_handle($new); # wherever $new comes from, not important > here > }
Actually, this callback would need to return whether or not to call the original fetch code again. Something like: $sth->{last_record_seen}= sub { my ($old)= @_; if ($new) { # wherever $new comes from, not important here $old->swap_inner_handle($new); return 1; # please try again } return 0; # we're really done, thank you } Liz