Background: A couple of yuears ago I wrote a complete Perl input/output
filtering system much like the current Apache filter system and a bunch
of filters for it. I'm hoping to port it to mod_perl-2.0 and let
mod_perl do the heavy lifting (filters are just a pain to manage).
Having done a bunch of filtering stuff, a large number of "real" filters
will need to be cognizant of BOS/middle/EOS (ie external state) issues,
and will occasionally need to manually send EOSs. Some will also need
to maintain internal state (even if just the tail end of whetever input
they've not processed). Since the same filter can be added multiple
times and each get different state (as mod_perl-2.0 does today), OO
Perl offer a nice API for filters, though a little magic is required
since Apache filters have only one callback.
NOTE: None of this is about the proposal for higher level
Perl{In,Out}putFilterHandler padding in my previous mail. This is all
only about bucket brigade filters.
To support a one-handler API like today's, Apache::Filter needs is_BOS()
and is_EOS() accessors added. That way a simple filter can be:
my $count ;
sub handler: MP_INPUT_FILTER {
my $filter = shift ;
if ( $filter->is_BOS() ) {
$count = 0 ;
$filter->print( "header\n" ) ;
}
while ( $filter->read( my $buffer ) ) {
$count+= length $buffer ;
$filter->print( ... ) ;
}
if ( $filter->is_EOS() ) {
$filter->print( "$count bytes filtered\n" ) ;
}
}
NOTE: There's no underlying BOS cookie that corresponds to the EOS
cookie, the bucket brigade API assumes that you'll cook up the context
before calling add_xxx_filter(), rather than providing a callback to do
intialization. It's a bit annoyingly non-symettric. 'course I don't
like magic cookies like EOS, they tend to get dropped by bugs all too
frequently and they introduce special cases in all of the code, so I'll
just shut up about the lack of a BOS :-).
That filter breaks if it's added twice, caveat Pusher. Perhaps we
should proffer an apache filteration mod that adds an "on_add" handler
to the ap_register_xx_filter() calls, since that's a useful place to do
double-add prevention and state initialization. It would make it easier
to encapsulate filters and expose them properly, allowing
SetOutputFilter directives to cause a filter init event to occur.
Adding a Apache::Filter::send_EOS() would also allow a filter to purge
Apache's filter chain to a network buffer and hang around doing cleanup
or consuming more input.
Now for the OO insanity...
A natural way to accomodate both the BOS/EOS and internal state would be
to allow Apache::Filter *instances* to be sublassed. Here's an example:
package Algae::Filter ;
use base qw(Apache::Filter) ;
sub new {
## Called manually (see below) before manually calling
## ap_add_{in,out}put_filter() or just before handle_BOS()
## is called if a filter named "Algae::Filter" is added
## using, say SetOutputFilter.
my $proto = shift ;
my ref $class = $proto || $proto ;
my $self = $class->SUPER::new() ;
... init internal state...
return $self ;
}
sub handle_BOS {
my $self = shift ;
## called when first bucket brigade arrives, just before the
## first call to handler().
return APR_SUCCESS ; ## Or not...
}
sub handler: AP_FTYPE_FOO, MP_INPUT_FILTER {
## Defaults to AP_FTYPE_CONTENT and MP_OUTPUT_FILTER
## Might we call this handle_content()???
my $self = shift ;
my ( $bb ) = @_ ; ## Optional
... process input, possibly send EOS ...
return APR_SUCCESS ;
}
sub handle_EOS {
my $self = shift ;
## Called after the last call to handler().
return APR_SUCCESS ;
}
Two common ways of adding this type of filter would be by using the
SetOutputFilter directive or by manually calling ap_add_output_filter().
In the SetOutputFilter case, nothing would happen until the first bucket
brigade arrived at modperl_output_filter_handler()'s gate. At that time
mod_perl would notice that no object had been built and would try to
call Algae::Filter->new(), setting the ctx->perl_ctx field to it's
result.
If no new() existed, perl_ctx would not be set. In this case the
Apache::Filter would be passed in to the other subs as-is. This is like
today's behavior, and would be what happens for simple cases.
After calling new(), mod_perl would try to call handle_BOS(), passing in
either perl_ctx (ie $self) or the plain ol' Apache::Filter depending on
whether new() returned anything.
Then, handler() would get called with the first arg ($self or the
Apache::Filter) and the first bucket brigade. It would
also be called for each additional bucket brigade that contained data
(other than an EOS).
When the EOS-tailed bucket brigade arrived, handler() would only be
called if it had no additional data before the EOS, and then
handle_EOS() would be called with the same first arg (and no bucket
brigade). If the filter didn't send_EOS() at some point manually,
mod_perl would now send the EOS.
The default Apache::Filter::handler would be a pass-through.
The reason for separate BOS and EOS handlers is to support
specialization through inheritence. If we don't want to burden all
bucket brigade filters this way, then perhaps we can use an adapter
class that provides a handler and calls (handle_BOS(), handle_content(),
and handle_EOS()) based on the new Apache::Filter::is_BOS() and
...::is_EOS() accessors.
NOTE: neither handle_BOS() nor handle_EOS() would be passed the bucket
brigade. All three event handlers would get $self as the first param
(or the default Apache::Filter object if new() was not called).
The second case is where user code wants to call ap_add_xxx_filter() to
add a filter. Here's how that might look:
my $f = Algae::Filter->new( ...parms... ) ;
$r->add_output_filter( $f ) ; ## or $c->...
This would use the blessed object of the filter as the name of the
filter (which could be quickly registered if not found) and the
reference in $f as the $perl_ctx.
When modperl_output_filter_handler() then gets the first bucket brigade,
it'll see the perl_ctx is present and not call new. It'll pass in the
perl_ctx as above to the other three functions in the same way.
Areas for improvement:
- Perhaps, when no new() is found and isa( "Algea::Filter",
"Apache::Filter" ) is true, the Apache::Filter object passed to the
handlers should be blessed in to "Algae::Filter" to allow easier
method calls. There's a large number of filters that would need no
new()ing and this little trick would keep users from having to
write new() just to call SUPER::new().
- It would be nice to be able to set/get a context without needing to
go OO. Not sure if an Apache::Filter accessor would do the trick
or if it would be better to pass in an optional third arg to the
handlers.
- We don't expose enough flexibility for a filter writer to say
AP_FTYPE_CONTENT+1 or AP_FTYPE_CONTENT-1 to bias a filter towards
either end of a content chain.
Wow, that's a lot of blather. Sorry.
Comments?
- Barrie
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]