Re: interface ideas for non-blocking mode

david nicol Fri, 27 Aug 2004 22:55:14 -0700

On Thu, 2004-08-26 at 03:04, Tim Bunce wrote:

> We would all like many things but have to settle for what's practical.
> ...
>
> Tim.

umm okay

All I knew, before my unsuccessful attempt to locate instances of
connect(2) in DBD::ODBC's xs and dbdimp.c files, was that I have in 
the past written code, in C and C++, that talked with SQL servers by
opening regular old TCP streams to servers and conversing on them.

Informix and Oracle.  And both times I was extending working code
so I didn't have to know or support all the possibilities.

The only time a TCP stream blocks is when one is waiting to read from
it, and this can be worked around, even on systems that do not have
nonblocking sockets (unless you're multithreading, in which case you
need to set up a mutex to avoid the race condition, so even if you're
multithreading, provided you can set up a nonblocking mutex), by
only trying to read from the socket when select(2) has indicated that
there is something there. It can also block when you're writing a lot
to it, but this can be worked around by being sure to only send
small chunks, when  you can't get non-blocking sockets.

In the application I am currently working on, I want prepare and
execute to return as soon as they can, even if error reports will get
deferred, and I would like to know if data is ready before attempting
to fetch a row, so I don't have to wait for the server to send it.

Expecting the *all* access functions to return partial sets is 
not a core requirement for nonblocking support, mandating that *all*
block would work just as well and would not slow down the blocking
case with needless checks.

Yes this class of issues can be trivially solved by demanding threading,
but that does not help when a(n unrealistic?) design constraint limits
you to one thread.

I realized after posting the feature request for ready(), more(), done()
that $h->timeout() also would make sense.  Select(2) defines the timeout
as a time value or null for indefinate blocking, for those tuning in
late.

Without a mandated interface to non-blocking, drivers will all 
implement it differently. I think it is not only practical but a good
idea to define a standard interface before every driver that can support
a nonblocking mode does it differently.

@{$sth}{qw/more done/} can be defined in terms of $sth->{Active}:

        sub more{  $_[0]->{Active}}
        sub done{ !$_[0]->{Active}}

supporting $h->ready implies that a driver supports a non-blocking
mode, in which partial state is maintained in the handle and we are
waiting for something from the other end.  In a blocking driver,
$h->ready would always be true because we are never in an incomplete
state.  Blocking DBI either returns or throws, then that phase of the
operation is complete.

In a nonblocking situation, a driver might return immediately, but
register a callback with itself so that when ready() is called, the
callback will run, and the pending operation will either get completed
and ready() will be true, or the pending operation will still be
pending and ready() will still be false.  The ready() would throw
the deferred error.

To make things easier, (prepare execute fetch) could be mandated to all
internally do 

     readycheck: $me->ready() or goto readycheck;

as they start, or even better,

     { 
      my $oldtimeout = $me->{Timeout};
      undef $me->{Timeout}; 
      return $h->set_err($err, "Defered $errmsg", $state)
         unless $me->ready(); # mandated to finish or throw
      $me->{Timeout} = $oldtimeout
     }

Or even better still, a driver could stack requests up, and each would
become ready or not-ready on its own.  No flexibility is taken from the
driver implementors.

So, my proposal is, to declare three read-only access functions
and one attribute.  These are more, done, ready and {Timeout}.

If pointed to the simplest TCP-based driver, I will cheerfully 
add this stuff to demonstrate that its practical.

Referring to M. Peppler's link:
http://sybooks.sybase.com/onlinebooks/group-sd/sdg1251e/ctref/@Generic__BookTextView/1039;pt=799/*#X

the ready() function would be always true in Synchronous mode, would
invoke the ct_poll in Deferred async mode, and would check to see if
the callback has occurred automatically in fully synchronous mode:

   @AsyncStHandles = map {$dbh->prepare($_)} @Statements;
   do{
     $pending = 0;
     for(@AsyncStHandles){
       unless($_->ready()){
          $pending++;
          next;
       };
       ...
       $_->more and $pending++;
     }
   } while $pending

How cool is that?

Full support of the optional NonBlocking attribute (settable by
attempting to define the Timeout attribute?) in general would
do something like Deferred Asyncronous mode.

The exact reason the pending operation is not ready -- CS_BUSY,
EWOULDBLOCK, etc. could be described in $DBI_ERRSTR.  Ready shouldn't
throw just because it isn't ready, so set_err would not be the right
way to set the message.

J Leffler wrote:
> One of the issues I think that the specification will have to address,
> probably restrictively, is whether you can have both an asynchronous
> (non-blocking) statement and other synchronous (or, indeed, other
> asynchronous) statements active on a single $dbh -- I suspect that the
> portable answer will be "No; only one statement, whether synchronous
> or asynchronous, can be active on a $dbh at any given time".

That would be the portable answer, but the non-portable answer would be
"Yes, if your driver supports it," just like any other fancy feature.

It's easy to imagine a drh opening additional streams for additional
statement handles, for instance, to mock up multiple simultaneous
asynchronous statements against a back-end that does not do that 
natively.

> ... Perl threading ...

How drivers implement the standard interface is up to the authors of
individual drivers. If your driver requires a threaded perl, your driver
requires a threaded perl. If your driver breaks on a threaded perl, your
driver breaks on a threaded perl.

S. Goeldner wrote:

> it looks like ADO uses events:
>  
> http://msdn.microsoft.com/library/en-us/ado270/htm/mdmscadoevents.asp

so, an anynchronous ADO driver might return immediately after issuing an
UPDATE command, and would not be ready until a RecordChangeComplete
event was received.

Dean Arnold reports:
> 
> DBD::Teradata has async support via driver-specific
> methods/attributes:
> 
> my @dbhs = ();
> my @sths = ();
> 
> ...connect N sessions, storing handles in @dbhs...
> 
> foreach (@dbhs) {
>         push @sths, $_->prepare('insert into table values(?,?,?)',
>                 { tdat_nowait => 1 });
> }
> 
> foreach (@sths) {
>         $_->execute([shift @paramtuples]);
> }
> 
> while (params to load) {
>         @avails = $drh->tdat_FirstAvailList([EMAIL PROTECTED], $timeout);
> 
>         foreach (@avails) {
>                 $rc = $sths[$_]->tdat_Realize();
>                 $sths[$_]->execute([shift @paramtuples]);
>         }
> }
> 
> The API also supports including filehandles in the list passed to
> tdat_FirstAvailList, in order to handle other async I/O events.

This sounds like tdat_FirstAvailList([EMAIL PROTECTED], $timeout) is a wrapper
around select(2), that tells us that there is data availale on the
stream, not necessarily that a whole row has been returned. 

Looking at http://www.presicient.com/tdatdbd/#realize I gather that
Teradata's execute() never blocks and tdat_Realize runs the callback on
a statement, to completion.

The ready() method I am proposing combines both of these, so that in
nonblocking mode ready would return false until enough data has come
back that tdat_Realize would not have to block were it to run.

> While this handles async execution, esp. in support of multiconnection
> operations relevant to Teradata's parallel nature,
> it doesn't really handle async completion notification, and
> I'm not certain there's a clean, DBMS-independent way to support
> that without using timers, signals, or threads, all of which
> may be problematic.

Working out the implementation is not our problem yet (well it is our
problem if we're maintaining a driver, but it's not our problem if we're
wearing "Lords Of The DBI Specification" hats)

http://www.presicient.com/tdatdbd/#optimize indicates that tdat_
already returns immediately from prepare(). Using the proposed
interface, if something were to go wrong with the prepare(), the error
might get reported by a subsequent call to ready() before the error
would be reported in the failing execute().

http://www.presicient.com/tdatdbd/#dblbuf in a situation where there
are multiple active statements, the results of the ready() method on
each object in the system will mean different things.

    $dbh->ready() # Can the session accept more instructions without
                  # blocking or are we still chewing on the last
                  # prepare and there's no more space in the statement
                  # preperation queue?

    $sth1->ready() # is this statement able to have a complete row
                   # fetched from it without blocking?

    $sth2->ready() # or this one?

http://www.presicient.com/tdatdbd/#moreres 
> When a fetch operation returns undef, a non zero tdat_more_results
> value indicates more Teradata statements are available for fetching on
> the statement handle. 

In the proposed interface, fetch would throw the equivalent of an
EWOULDBLOCK error with its undef (how far it throws it depends on 
what RaiseError is set to) when there is more NO NO NO!
fetch would block if called while $sth->ready is still false. 
Maybe the standard could specify could specify two levels of asyncronous
support, one in which fetch gives an error when there is more data and
another in which it blocks, but that is needlessly complex.  The
situation is analogous to the discussions that I imagine occured in
Redmond whenever they decided that microsoft sockets would not support
non-blocking modes. We can say that a true ready indicates that there
is at least one row to fetch, and driver behavior when fetch is called
before readiness happens is left to the driver: it can block, or 
treat the situation as an error, either one.

Finally, Tim Bunce agreed with Dean Arnold that
> Also, in many (most?) instances, driver support for Perl threads
> may obviate the need for an async API; in fact, I'd prefer to see
> driver developers focus first on thread support, since that doesn't
> really require any API definitions, and provides much the same
> capability.

Here's a proposed non-blocking API extension.  Approving it does not
interfere with development of improved threading support. In fact,
approving it may encourage threading development as threads will be
within-limits for implementation strategies.  

Here is my proposal for a non-blocking DBI extension that is
a full and unchanged superset of current synchronous DBI:

  Requesting a connection in nonblocking mode: 

  $dbh = DBI->connect($source, $user, $passwd, {Timeout => 0})

  $h->{Timeout}  if supported, reccommends timeout for blocking
  calls and is inherited and is changeable at every level without
  affecting parent.  C<undef $h->{Timeout}> gives synchronous
  behavior.

  Conformance to this extension can be determined by the existence
  of the ready() method.

  $sth->ready() if supported, guarantees that the next fetch
  will not block, or that a non-data-returning command has completed.
  Invokes communication-related callbacks as in Sybase's Deferred
  Asynchronous mode. Sets $errstr and may die when RaiseError is set.

  $dbh->ready() if supported, guarantees that a session handle
  will not block when asked to prepare a new statement handle.
  Sets $errstr and may die from a deferred error when RaiseError is set

  $h->done() is a trivial wrapper for !$h->{Active}

  $h->more() is a trivial wrapper for $h->{Active}

  Drivers that do not implement return without answer at the prepare
  and execute levels should not implement $dbh->ready. Intermediate
  versions of individual drivers may block on any methods.

  $t_a_r = $sth->fetchavail_arrayref( $slice, $max_rows )
    just like fetchall_arrayref, except that only data that has already
    arrived and been enbuffered is returned.  

  The *all* functions will continue to block.

Implementing the ready() method with threads will be easy.  Implementing
ready() without threads will be tricky but possible, when one has
access to the communications layer.

Implementation suggestions:

I expect that the best way to add this extension to your module
would be to write a nonblocking module separate from the main regular
module, and have the connect method return an object of the nonblocking
type when $attr{Timeout} exists.

no threads:
 Include all currently pending operations on all handles of this type
 of database in a set that is checked whenever ready() is called on
 a pending object.

threads:
 launch a thread with each statement handle. The thread responds to
 incoming socket data by writing it into a buffer, and when there
 is enough there, the thread sets the statement's ready attribute. All
 ready() has to do is return the ready attribute.

David Nicol

Re: interface ideas for non-blocking mode

Reply via email to