Re: Fwd: Idea for a Gofer transport for translating SQL - primarily intended for testing

Andrew Ford Sun, 11 Sep 2011 15:26:18 -0700

I suggest that we move this discussion to dbi-dev.

On 11/09/11 18:14, Tim Bunce wrote:

On Fri, Sep 09, 2011 at 03:50:53PM +0100, Andrew Ford wrote:

Currently DBD::Gofer::dr:connect() contains code to set up the transport:


            my $transport_class = delete $go_attr{go_transport}
                or return $drh->set_err($DBI::stderr, "No transport argument in 
'$orig_dsn'");
            $transport_class = "DBD::Gofer::Transport::$transport_class"
                unless $transport_class =~ /::/;
            _load_class($transport_class)
                or return $drh->set_err($DBI::stderr, "Can't load $transport_class: 
$@");
            my $go_transport = eval { $transport_class->new(\%go_attr) }
                or return $drh->set_err($DBI::stderr, "Can't instanciate 
$transport_class: $@");

I would imagine that we could have a "middleware" keyword that was a
comma-separated list of names that were assumed to be in the
DBDx::GoferMiddleware namespace - these would be installed in
sequence with a similar block of code that follows on from the
snippet above, looking something like:

            my $middleware_classes = delete $go_attr{go_middleware};
            foreach my $middleware_class (split /,/, $middleware_classes) {
                $middleware_class = "DBDx::GoferMiddleware::$middleware_class"
                    unless $middleware_class =~ /::/;
                _load_class($middleware_class)
                    or return $drh->set_err($DBI::stderr, "Can't load 
$middleware_class: $@");
                $go_transport->middleware( sub { return 
$middleware_class->HANDLER(shift, $go_transport->middleware) } );

My code was wrong, what I meant was:

            my $next_middleware = $go_transport->middleware;

$go_transport->middleware( sub { return$middleware_class->handler(shift, $next_middleware) } );

as in the original code the reference in the closure to $go_transportwould refer to its value when the closure was defined, but invoking themiddleware accessor on that reference would return the middleware valueas it is when the closure is invoked - not when it was defined, whereascalculating $next_middleware when the closure is defined means that itwill still point to the value of the previous middleware (as it was whenthe closure was defined) when it is later used as the closure is invoked.

Each middleware handler would take a request object and a reference
to the next handler in the chain.

Calling $go_transport->middleware within the sub isn't right if we go
with a chain of closures, which I'd prefer.  I'd expect that last line
to be something like:

     $go_transport->middleware( $middleware_class->new_handler( 
$go_transport->middleware ) );

where new_handler would look something like:

     sub new_handler {
         my ($class, $existing_handler) = @_;

         my $new_handler = sub {
             my ($self, $request) = @_;
             ...do something with $request...
             my $response = $existing_handler->($self, $request);
             ...do something with $response...
             return $response;
         } );

         return $new_handler;
     }

So at setup-time a chain of closures is created.

The difference between the two approaches is whether the middlewareclass provides a new_handler method that builds the closure, or whetherthe middleware class provides a handler method that is included in aclosure that the Gofer class constructs. The latter approach wouldinvolve an extra subroutine call per middleware layer at runtime (as youwould end up with a closure calling the middleware handler that callsthe next closure...). Your approach is probably cleaner, but it doesput the onus on middleware writers to build the closures properly(although that can be mitigated against with good documentation thatprovides example code for writing new_hander).

The DSNs would then look like:

     
dbi:Gofer:transport=null;middleware=adaptor;transform=MyMapper;dsn=dbi:XXX...

I don't see a need for both middleware= and transform=. Once the
middleware concept is in Gofer then middleware=MyMapper would suffice.
(Unless I'm missing something.)

I wasn't clear here - I meant that in my original idea I would have amiddleware component called "adaptor" that took a parameter called"MyMapper" (this was the name used in my original example - but it couldhave been anything). Thinking about it further I got to the idea thatit should be something like (using the same component/parameter names):

dbi:Gofer:transport=null;middleware=adaptor(transform=MyMapper);dsn=dbi:XXX...


(which is the syntax that you suggest in your response, below)

Then thinking about my stored-procedure-to-simple-SQL adaptor, Irealized that that is then application-specific and wouldn't besomething to publish to CPAN, so the example just becomes:


    dbi:Gofer:transport=null;middleware=MyApp::MyMapper;dsn=dbi:XXX

As I would probably want to use a class name in my own class hierarchyfor my (application-specific) middleware.

There are a few holes in this that I can think of:

    * We might want to instantiate a middleware object for each
      component and use that in installing the middleware handler

I don't see a need. Got a use case?

No - I was just not that confident in my component design abilities andwas trying to be overly general.

    * I am not sure how we could specify a handler name other than the
      default for individual middleware components

I don't see a need. Got a use case?

Same as above - probably no need.

    * Also not sure how attributes for each middleware component could
      be specified such that the component for which the attribute is
      intended could be specified - the example DSN above assumes that
      the attribute (transform in the example DSN) is just dumped into
      the general "go_" attribute hash)

Some smarter parsing should make this style possible:

     
dbi:Gofer:transport=null;middleware=Foo(a=1,b=2),Bar(a=2,b=1);dsn=dbi:XXX...

Anything more fancy could be done by writing a new module with whatever
logic is needed.

Agreed - the smarter parsing is quite straightforward.

I wondered about passing the arguments as a string to new_handler -leaving it to the middleware module to parse the argument string; thiswould allow for modules to be specified as taking positional arguments(especially useful if they only take one argument), e.g.:

dbi:Gofer:transport=null;middleware=delay(5),module2(a=1,b=2);dsn=dbi:XXX...

having to include something like "delay(delay=5)" if that is the onlyparameter strikes me as ugly - although I suppose it avoids ambiguity ifthe component introduces new parameters later. So I could go either wayon this - either having Gofer parse the keyword/value pairs and pass ahashref to the handler constructor, or leave them unparsed and pass astring to the constructor.

    * The go_middleware attribute could be set in the DBI connect
      attribute hash to a list of subroutine references (code would need
      to differentiate between a comma-separated list value and a list
      reference value)

Umm, firstly having a go_middleware attribute that's different to the
$go_transport->middleware attribute is going to get confusing.

Let's keep 'middleware' as the name for the concept but rename the
go_middleware attribute to go_via. That's shorter and more descriptive:

     dbi:Gofer:transport=null;via=MyMapper;dsn=dbi:XXX...

And rename $go_transport->middleware to $go_transport->sender.
That's also shorter and actually more descriptive since it is just a
sender (the fact that the sender code ref may reference layers of
middleware is a separate issue).

Let's ignore the semantics of go_via being a reference for now.
Doesn't seem like a priority.

OK - sounds fine to me.

    * Should the middleware components be listed innermost first (as
      implemented in the pseudo-code above) or outermost first?

Innermost first would match the implementation, i.e., when read left to
right each is wrapping the previous (where 'previous' for the first one
is the transport itself).

I think that outermost might be more intuitive, e.g.:

dbi:Gofer:transport=null;via=delay(5),MyMapper,logger(output=syslog);dsn=dbi:XXX...

reads to me like it delays 5 seconds, invokes MyMapper, then logs tosyslog. Implementing that interpretation is, of course, just a matterof popping the components off the string rather than shifting them off -or "reverse split(...)"

There is still the issue that the middleware gets invoked before thetransport... sort of - actually it is the transport that is invokingthe chain of closures that make up the middleware. Anyway,linguistically with the DSN I am saying that "I want the null transportto be used... and by the way I want it to pass the request through theseitems of middleware" - so I think that outermost-first would be what Iat least would expect. Of course one could model it as filters, ratherthan layers, and have pre=... and post=... components but that would bemore tricky to implement

    * We might want to add some convenience methods to the request class
      to build response objects (as suggested in my
      DBD::Gofer::Transport::adaptor code)

Maybe ensure the relevant server classes can be used safely on the client side.
Adding some docs to http://search.cpan.org/perldoc?DBI::Gofer::Response
would be a good start.


Yep.  Docs are always nice ;-)

Some light refactoring might be needed. Specifically new_response_with_err
in DBI::Gofer::Execute should probably move into DBI::Gofer::Response.

Yep

Other server-specific stuff in DBI::Gofer::Execute might need to move out
(perhaps to a DBI::Gofer::ServerExecute subclass) so it can be used as a
base class for middlewares that want to appear to 'execute' some requests.
Certainly middlewares ought to inherit from a base class supplied by the
DBI to enable safer evolution.

Yes (he says, nodding sagely but thinking that he really needs to studythe code in much more detail).

Should the middleware inherit from DBD::GoferMiddleware::Base orDBD::Gofer::Middleware::Base or what? And what class should we use asthe default class prefix? Are we still saying that unqualifiedmiddleware classes are assumed to have a prefix of DBDx::GoferMiddleware?

I would like to have some methods that can be called without having tounderstand the layout of the request and response objects. The requestobject actually doesn't seem to be too complex to deal with (especiallyif the documentation is expanded), but it feels like there are too manythings to be aware of with the response object, and if one hascalculated a result set somehow, it would be easier to say:


  $response = DBI::Gofer::Response->new_response_from_data(
               $rv, $@, $dbh,
               [ 'code integer not null', 'description varchar(255)' ],
               [ [ 42, 'the ultimate answer' ],
                 [ 22, 'you can, but you can\'t' ] ] )

rather than

  $response = DBI::Gofer::Response->new_response_with_err();
  $response->gather_sth_result_sets(
            [ { NAME => [ 'code, 'description' ],
                 TYPE => [ 'integer', 'varchar(255)' ],
                 NULLABLE => [ 1, 0 ],
                 rowset => [ [ 42, 'the ultimate answer' ],

[ 22, 'you can, but you can\'t' ],] ] } );


(and I am sure that I have got the second version wrong)

Adding some methods like the suggested 'new_response_from_data' wouldmake writing middleware less arcane.

I quite like the idea of specifying the column headers with SQL columndefinitions and then letting the Gofer code work out how that should berepresented internally. One could even special case the case of asingle row, so that it becomes:


           $response = DBI::Gofer::Response->new_response_from_data(
               $rv, $@, $dbh,
               [ 'code integer not null', 'description varchar(255)' ],
               [ 42, 'the ultimate answer' ] );

There's probably a load of intracacies that I am not aware of inbuilding responses (and I do know that a response can return multiplerow sets), but I am sure that something sensible and DWIM-ey could bethought out.

I'll have a go at getting this coded over the next few days, but it
looks as if the changes required to implement middleware are
relatively trivial.

Yeah! Many thanks for doing this Andrew. It's a great addition to Gofer.


It is scratching my itch!

I've got some code included in Gofer and Transport::Base and it doesn'tbreak the unit tests. The suggestions we've discussed will involve somechanges, but not many. I don't see any of it being particularlydifficult - probably the thing requiring most thought is coming up withsome new unit tests that thoroughly exercise the new code, incombination with other transports as well as the new "passthrough"transport.

Your module could then use this mechanism. If you wanted to release it
to CPAN then a name like DBDx::GoferMiddleware::FOO would be good.
(Note the DBDx not DBIx since this is client-side. We may well end up
with server-side gofer middlewares as well.)

I think I would take the name DBDx::GoferMiddleware::adaptor for my
transformation middleware, unless anyone has a better idea

Umm, adaptor seems too generic. Has almost no meaning out of context.
You'd have to make the logic *much* more generic for the name to fit.
A name like RewriteSPcalls would be more descriptive of the current logic.

As I said earlier, with a middleware interface in place, thetransformation becomes completely application specific, so does not needto be published.

However I may want to publish a reworked transport that implementsmiddleware in the transport - and mark it as deprecated - as it would beuseful and would work with versions of DBI that had Gofer, but did nothave the middleware feature. I know that many companies do not upgrademodules like DBI for years, and a middleware-in-a-transport module wouldbe useful for them (I work on some systems with CentOS 5 and that onlyhas DBI 1.52 - no Gofer even).

(are we keeping the final class name component for the middleware
class in lower case, as it is with the transport classes?).

I think I'd prefer a capitalized name for these.

How does one cope with capitalized class names and lower case componentnames - or are you saying that we suggest that middleware writers usecapitalized class names and thus the names listed with "via=" would bein upper case (to distinguish them from transport names?)?

If we have a DBD::Gofer middleware feature then it would be helpful to
add a new passthrough (or "passthru") transport.  This would be based
on the null transport, but without the serialization/deserialization,
i.e. something as simple as:

For efficiency reasons the DBI::Gofer::Execute class assumes it can
modify elements of the request object in-place. The serialization is
needed to avoid subtle bugs that that would otherwise cause. I looked
into 'fixing' that some years ago but there wasn't a good use case then
- the null transport was only ever used for testing. It's probably not
worth trying to fix though. The overhead of serialization/deserialization
is probably (I'm guessing) a relatively small part of the overall
overhead of using Gofer.

Feel free to try, but I wouldn't make it a priority :)

I noticed those "subtle bugs" when I added the "passthrough" transportto the DBI tests (85gofer.t). I fixed them by using Storable::dclone()on the request; the response does not need to beserialized/deserialized. I am also wondering whether this transportshould be connection-oriented, as it is going to be used to proxy toother local DSNs, so I can't see the point in re-connecting for eachrequest. I see the "passthrough" transport as being moreproduction-quality than the "null" transport.


Andrew

--
Andrew Ford
South Wing Compton House, Compton Green,
Redmarley, Gloucestershire, GL19 3JB, UK
Tel:    +44 1531 829900
Mobile: +44 7785 258278
Email:  a.f...@ford-mason.co.uk

Re: Fwd: Idea for a Gofer transport for translating SQL - primarily intended for testing

Reply via email to