I suggest that we move this discussion to dbi-dev.
On 11/09/11 18:14, Tim Bunce wrote:
On Fri, Sep 09, 2011 at 03:50:53PM +0100, Andrew Ford wrote:
Currently DBD::Gofer::dr:connect() contains code to set up the transport:
my $transport_class = delete $go_attr{go_transport}
or return $drh->set_err($DBI::stderr, "No transport argument in
'$orig_dsn'");
$transport_class = "DBD::Gofer::Transport::$transport_class"
unless $transport_class =~ /::/;
_load_class($transport_class)
or return $drh->set_err($DBI::stderr, "Can't load $transport_class:
$@");
my $go_transport = eval { $transport_class->new(\%go_attr) }
or return $drh->set_err($DBI::stderr, "Can't instanciate
$transport_class: $@");
I would imagine that we could have a "middleware" keyword that was a
comma-separated list of names that were assumed to be in the
DBDx::GoferMiddleware namespace - these would be installed in
sequence with a similar block of code that follows on from the
snippet above, looking something like:
my $middleware_classes = delete $go_attr{go_middleware};
foreach my $middleware_class (split /,/, $middleware_classes) {
$middleware_class = "DBDx::GoferMiddleware::$middleware_class"
unless $middleware_class =~ /::/;
_load_class($middleware_class)
or return $drh->set_err($DBI::stderr, "Can't load
$middleware_class: $@");
$go_transport->middleware( sub { return
$middleware_class->HANDLER(shift, $go_transport->middleware) } );
My code was wrong, what I meant was:
my $next_middleware = $go_transport->middleware;
$go_transport->middleware( sub { return
$middleware_class->handler(shift, $next_middleware) } );
as in the original code the reference in the closure to $go_transport
would refer to its value when the closure was defined, but invoking the
middleware accessor on that reference would return the middleware value
as it is when the closure is invoked - not when it was defined, whereas
calculating $next_middleware when the closure is defined means that it
will still point to the value of the previous middleware (as it was when
the closure was defined) when it is later used as the closure is invoked.
Each middleware handler would take a request object and a reference
to the next handler in the chain.
Calling $go_transport->middleware within the sub isn't right if we go
with a chain of closures, which I'd prefer. I'd expect that last line
to be something like:
$go_transport->middleware( $middleware_class->new_handler(
$go_transport->middleware ) );
where new_handler would look something like:
sub new_handler {
my ($class, $existing_handler) = @_;
my $new_handler = sub {
my ($self, $request) = @_;
...do something with $request...
my $response = $existing_handler->($self, $request);
...do something with $response...
return $response;
} );
return $new_handler;
}
So at setup-time a chain of closures is created.
The difference between the two approaches is whether the middleware
class provides a new_handler method that builds the closure, or whether
the middleware class provides a handler method that is included in a
closure that the Gofer class constructs. The latter approach would
involve an extra subroutine call per middleware layer at runtime (as you
would end up with a closure calling the middleware handler that calls
the next closure...). Your approach is probably cleaner, but it does
put the onus on middleware writers to build the closures properly
(although that can be mitigated against with good documentation that
provides example code for writing new_hander).
The DSNs would then look like:
dbi:Gofer:transport=null;middleware=adaptor;transform=MyMapper;dsn=dbi:XXX...
I don't see a need for both middleware= and transform=. Once the
middleware concept is in Gofer then middleware=MyMapper would suffice.
(Unless I'm missing something.)
I wasn't clear here - I meant that in my original idea I would have a
middleware component called "adaptor" that took a parameter called
"MyMapper" (this was the name used in my original example - but it could
have been anything). Thinking about it further I got to the idea that
it should be something like (using the same component/parameter names):
dbi:Gofer:transport=null;middleware=adaptor(transform=MyMapper);dsn=dbi:XXX...
(which is the syntax that you suggest in your response, below)
Then thinking about my stored-procedure-to-simple-SQL adaptor, I
realized that that is then application-specific and wouldn't be
something to publish to CPAN, so the example just becomes:
dbi:Gofer:transport=null;middleware=MyApp::MyMapper;dsn=dbi:XXX
As I would probably want to use a class name in my own class hierarchy
for my (application-specific) middleware.
There are a few holes in this that I can think of:
* We might want to instantiate a middleware object for each
component and use that in installing the middleware handler
I don't see a need. Got a use case?
No - I was just not that confident in my component design abilities and
was trying to be overly general.
* I am not sure how we could specify a handler name other than the
default for individual middleware components
I don't see a need. Got a use case?
Same as above - probably no need.
* Also not sure how attributes for each middleware component could
be specified such that the component for which the attribute is
intended could be specified - the example DSN above assumes that
the attribute (transform in the example DSN) is just dumped into
the general "go_" attribute hash)
Some smarter parsing should make this style possible:
dbi:Gofer:transport=null;middleware=Foo(a=1,b=2),Bar(a=2,b=1);dsn=dbi:XXX...
Anything more fancy could be done by writing a new module with whatever
logic is needed.
Agreed - the smarter parsing is quite straightforward.
I wondered about passing the arguments as a string to new_handler -
leaving it to the middleware module to parse the argument string; this
would allow for modules to be specified as taking positional arguments
(especially useful if they only take one argument), e.g.:
dbi:Gofer:transport=null;middleware=delay(5),module2(a=1,b=2);dsn=dbi:XXX...
having to include something like "delay(delay=5)" if that is the only
parameter strikes me as ugly - although I suppose it avoids ambiguity if
the component introduces new parameters later. So I could go either way
on this - either having Gofer parse the keyword/value pairs and pass a
hashref to the handler constructor, or leave them unparsed and pass a
string to the constructor.
* The go_middleware attribute could be set in the DBI connect
attribute hash to a list of subroutine references (code would need
to differentiate between a comma-separated list value and a list
reference value)
Umm, firstly having a go_middleware attribute that's different to the
$go_transport->middleware attribute is going to get confusing.
Let's keep 'middleware' as the name for the concept but rename the
go_middleware attribute to go_via. That's shorter and more descriptive:
dbi:Gofer:transport=null;via=MyMapper;dsn=dbi:XXX...
And rename $go_transport->middleware to $go_transport->sender.
That's also shorter and actually more descriptive since it is just a
sender (the fact that the sender code ref may reference layers of
middleware is a separate issue).
Let's ignore the semantics of go_via being a reference for now.
Doesn't seem like a priority.
OK - sounds fine to me.
* Should the middleware components be listed innermost first (as
implemented in the pseudo-code above) or outermost first?
Innermost first would match the implementation, i.e., when read left to
right each is wrapping the previous (where 'previous' for the first one
is the transport itself).
I think that outermost might be more intuitive, e.g.:
dbi:Gofer:transport=null;via=delay(5),MyMapper,logger(output=syslog);dsn=dbi:XXX...
reads to me like it delays 5 seconds, invokes MyMapper, then logs to
syslog. Implementing that interpretation is, of course, just a matter
of popping the components off the string rather than shifting them off -
or "reverse split(...)"
There is still the issue that the middleware gets invoked before the
transport... sort of - actually it is the transport that is invoking
the chain of closures that make up the middleware. Anyway,
linguistically with the DSN I am saying that "I want the null transport
to be used... and by the way I want it to pass the request through these
items of middleware" - so I think that outermost-first would be what I
at least would expect. Of course one could model it as filters, rather
than layers, and have pre=... and post=... components but that would be
more tricky to implement
* We might want to add some convenience methods to the request class
to build response objects (as suggested in my
DBD::Gofer::Transport::adaptor code)
Maybe ensure the relevant server classes can be used safely on the client side.
Adding some docs to http://search.cpan.org/perldoc?DBI::Gofer::Response
would be a good start.
Yep. Docs are always nice ;-)
Some light refactoring might be needed. Specifically new_response_with_err
in DBI::Gofer::Execute should probably move into DBI::Gofer::Response.
Yep
Other server-specific stuff in DBI::Gofer::Execute might need to move out
(perhaps to a DBI::Gofer::ServerExecute subclass) so it can be used as a
base class for middlewares that want to appear to 'execute' some requests.
Certainly middlewares ought to inherit from a base class supplied by the
DBI to enable safer evolution.
Yes (he says, nodding sagely but thinking that he really needs to study
the code in much more detail).
Should the middleware inherit from DBD::GoferMiddleware::Base or
DBD::Gofer::Middleware::Base or what? And what class should we use as
the default class prefix? Are we still saying that unqualified
middleware classes are assumed to have a prefix of DBDx::GoferMiddleware?
I would like to have some methods that can be called without having to
understand the layout of the request and response objects. The request
object actually doesn't seem to be too complex to deal with (especially
if the documentation is expanded), but it feels like there are too many
things to be aware of with the response object, and if one has
calculated a result set somehow, it would be easier to say:
$response = DBI::Gofer::Response->new_response_from_data(
$rv, $@, $dbh,
[ 'code integer not null', 'description varchar(255)' ],
[ [ 42, 'the ultimate answer' ],
[ 22, 'you can, but you can\'t' ] ] )
rather than
$response = DBI::Gofer::Response->new_response_with_err();
$response->gather_sth_result_sets(
[ { NAME => [ 'code, 'description' ],
TYPE => [ 'integer', 'varchar(255)' ],
NULLABLE => [ 1, 0 ],
rowset => [ [ 42, 'the ultimate answer' ],
[ 22, 'you can, but you can\'t' ],
] ] } );
(and I am sure that I have got the second version wrong)
Adding some methods like the suggested 'new_response_from_data' would
make writing middleware less arcane.
I quite like the idea of specifying the column headers with SQL column
definitions and then letting the Gofer code work out how that should be
represented internally. One could even special case the case of a
single row, so that it becomes:
$response = DBI::Gofer::Response->new_response_from_data(
$rv, $@, $dbh,
[ 'code integer not null', 'description varchar(255)' ],
[ 42, 'the ultimate answer' ] );
There's probably a load of intracacies that I am not aware of in
building responses (and I do know that a response can return multiple
row sets), but I am sure that something sensible and DWIM-ey could be
thought out.
I'll have a go at getting this coded over the next few days, but it
looks as if the changes required to implement middleware are
relatively trivial.
Yeah! Many thanks for doing this Andrew. It's a great addition to Gofer.
It is scratching my itch!
I've got some code included in Gofer and Transport::Base and it doesn't
break the unit tests. The suggestions we've discussed will involve some
changes, but not many. I don't see any of it being particularly
difficult - probably the thing requiring most thought is coming up with
some new unit tests that thoroughly exercise the new code, in
combination with other transports as well as the new "passthrough"
transport.
Your module could then use this mechanism. If you wanted to release it
to CPAN then a name like DBDx::GoferMiddleware::FOO would be good.
(Note the DBDx not DBIx since this is client-side. We may well end up
with server-side gofer middlewares as well.)
I think I would take the name DBDx::GoferMiddleware::adaptor for my
transformation middleware, unless anyone has a better idea
Umm, adaptor seems too generic. Has almost no meaning out of context.
You'd have to make the logic *much* more generic for the name to fit.
A name like RewriteSPcalls would be more descriptive of the current logic.
As I said earlier, with a middleware interface in place, the
transformation becomes completely application specific, so does not need
to be published.
However I may want to publish a reworked transport that implements
middleware in the transport - and mark it as deprecated - as it would be
useful and would work with versions of DBI that had Gofer, but did not
have the middleware feature. I know that many companies do not upgrade
modules like DBI for years, and a middleware-in-a-transport module would
be useful for them (I work on some systems with CentOS 5 and that only
has DBI 1.52 - no Gofer even).
(are we keeping the final class name component for the middleware
class in lower case, as it is with the transport classes?).
I think I'd prefer a capitalized name for these.
How does one cope with capitalized class names and lower case component
names - or are you saying that we suggest that middleware writers use
capitalized class names and thus the names listed with "via=" would be
in upper case (to distinguish them from transport names?)?
If we have a DBD::Gofer middleware feature then it would be helpful to
add a new passthrough (or "passthru") transport. This would be based
on the null transport, but without the serialization/deserialization,
i.e. something as simple as:
For efficiency reasons the DBI::Gofer::Execute class assumes it can
modify elements of the request object in-place. The serialization is
needed to avoid subtle bugs that that would otherwise cause. I looked
into 'fixing' that some years ago but there wasn't a good use case then
- the null transport was only ever used for testing. It's probably not
worth trying to fix though. The overhead of serialization/deserialization
is probably (I'm guessing) a relatively small part of the overall
overhead of using Gofer.
Feel free to try, but I wouldn't make it a priority :)
I noticed those "subtle bugs" when I added the "passthrough" transport
to the DBI tests (85gofer.t). I fixed them by using Storable::dclone()
on the request; the response does not need to be
serialized/deserialized. I am also wondering whether this transport
should be connection-oriented, as it is going to be used to proxy to
other local DSNs, so I can't see the point in re-connecting for each
request. I see the "passthrough" transport as being more
production-quality than the "null" transport.
Andrew
--
Andrew Ford
South Wing Compton House, Compton Green,
Redmarley, Gloucestershire, GL19 3JB, UK
Tel: +44 1531 829900
Mobile: +44 7785 258278
Email: a.f...@ford-mason.co.uk