Re: Having a code ref executed when result set is all fetched

Tim Bunce Fri, 30 Mar 2012 02:34:34 -0700

On Tue, Mar 27, 2012 at 04:17:22PM +0200, Elizabeth Mattijsen wrote:
> >> Becomes:
> >> =============================
> >> my %visitors;
> >> my $values;
> >> while ( $sth= $sth->next ) {
> >>    $visitors{ $values->[0] }= undef
> >>      while $values= $sth->fetchrow_arrayref;
> >> }
> >> printf "Found %d unique visitors\n", scalar keys %visitors;
> >> =============================
> > 
> > There's actually an (almost completely undocumented) more_results()
> > method that's meant for that kind of thing. I was rather surprised
> > to find it wasn't documented as I'm pretty sure we thrashed out the
> > semantics some time ago.
> > 
> > [later] Wow, "some time ago" was 2004 :)
> > See 
> > http://markmail.org/thread/i7drouwtpdiybjzn#query:+page:1+mid:p4dy4uv3t2kpku56+state:results
> > The "example implementation" in there assumed $sth->prepare works.
> > Given current realities the more_results method would be like:
> >    $sth = $sth->{Database}->prepare($sql);
> >    $sth->execute;
> >    return $sth; # caller must use this returned handle
> 
> The "next" method looks a lot like the "more_results" method, but with
> one very important difference.  The "more_results" method depends on
> database driver support for multiple result sets.  In my case, each
> statement could be executed using a *different* database handle,
> potentially on a different database instance, potentially with a
> different database driver.
> 
> In other words: "more_results" is too limited in its scope to be
> useful for what I'm trying to accomplish.


I'm suggesting that you use/write a driver that holds multiple database
handles and hides the switching from one to another.
See, for example, http://search.cpan.org/perldoc?DBD::Multiplex


> >>  Stealing DBI::st::STORE (on the assumption that the tie interface would 
> >> be used to reset the Active attribute)
> > Again, compiled drivers tend to take shortcuts (in this case via the
> > DBIc_ACTIVE_off macro defined in DBIXS.h).
> 
> Ok, if I interpret this correctly, then a change in that macro that
> would allow support for a code reference to be called, would be the
> only change needed.  And a recompile of any database driver, of
> course.   But that would be transparent for anybody installing from
> source, right?

I wouldn't be comfortable with a code ref being called from within a
macro like that. The driver is in an unknown state at the time it's
called and re-entering the driver may cause unpredictable problems.

Adding a smart driver layer, long the lines of DBD::Multiplex, seems
like a better approach. What you're doing is the kind of thing
DBD::Multiplex was intended for (though it got neglected).

> > It's not quite that simple if you're trying to avoid code changes in the
> > application - i.e., if a while(fetch) loop should transparently return
> > rows from multiple statements. The problem is the control flow.
> > 
> > At what point in the dispatcher does last_record_seen get called
> > and how does it, or the dispatcher, then arrange to (re)call the
> > fetch method to get the next row?
> > 
> > Hooking into the Active flag being turned off wouldn't be safe
> > since the driver is not expecting to be reentered at that point.
> 
> I guess my use of swap_inner_handle would circumvent that.  Nothing
> changes for the old inner statement handle.  And the new inner
> statement handle would simply become active.  The driver would not
> need to know about this, as this is already at statement handle level.
> Or am I missing something?

It's certainly worth exploring. Suitably documented with caveats.

> > The existing Callback mechanism 
> > http://search.cpan.org/~timb/DBI/DBI.pm#Callbacks
> > only works on the pre-call side of the dispatcher, currently.
> > Extending it to allow callbacks on the post-call (return) side is
> > certainly a possibility that Ive outlined previously. To avoid being
> > too expensive for your needs the mechanism would have to support an
> > optional "only call this callback if the return value is false" flag.
> > 
> > So perhaps we'd end up with something like this
> > 
> >    $sth->{Callbacks}{fetchrow_arrayref} = [
> >        undef,  # pre-call hook
> >        undef,  # post-call hook for true returns
> >        sub {   # post-call hook for false returns
> >            ...
> >        }
> >    ];
> 
> Thinking about this some more, the way forward for this seems to be
> adding support for another special callback key, e.g. 'Active.off' (by
> adapting the DBIc_ACTIVE_off macro).  Its parameters would be the
> statement handle on which the last record was just fetched (causing
> the Active attribute to be set to "off"), and the method name that was
> used to exhaust the resultset.

This would also be an abuse of the design and implementation Callbacks
mechanism, which is rooted in the DBI dispatcher control flow.
The DBIc_ACTIVE_off macro doesn't know the name of the method, for example.
(The connect_cached.reused and connect_cached.new hooks have set a
precedent already, albeit one I'm not very happy about.)

> >> Actually, this callback would need to return whether or not to call the 
> >> original fetch code again.  Something like:
> >> 
> >> $sth->{last_record_seen}= sub {
> >>   my ($old)= @_;
> >>   if ($new) {   # wherever $new comes from, not important here
> >>       $old->swap_inner_handle($new);
> >>       return 1;  # please try again
> >>   }
> >>   return 0; # we're really done, thank you
> >> }
> > 
> > I don't know how easy or safe it would be to introduce a loop into the
> > dispatcher to allow that. Maybe not too hard. It may be a better API to
> > treat the return value of the post-call callback as the value to be
> > returned to the application, so the callback would end with
> > 
> >    return $sth->fetchrow_arrayref;
> 
> I guess that would need to be:
> 
>   $sth->{Callbacks}->{Active.off}= sub {
>       my ( $old, $method )= @_;
>       if ($new) {  # wherever $new comes from
>           $old->swap_inner_handle($new);
>           return $old->$method;
>       }
>       return;
>   };

An Active.off callback can't know the name of the method. (Not without
some hackery in the dispatcher to record it, or crawling up the call stack.)

> Or is there in the end just one method that can exhaust a resultset?

Lots of methods can exhaust a resultset. Theoretically the finish()
method should be called when they do. The fact that many don't is an
optimization that could be optionally disabled. Combine that with
extending the Callbacks mechanism to optionally apply to nested DBI
calls made by a driver, and you might have a workable approach.

Seems worth exploring.

Tim.

Re: Having a code ref executed when result set is all fetched

Reply via email to