Re: Matchers vs. Selectors [was Re: Retuning Sitemap Design]

Berin Loritsch Fri, 11 Jan 2002 13:11:51 -0800

Stefano Mazzocchi wrote:

> Judson Lester wrote:
> 
> <snip/>
> 
>>The clearest dilemma I can see is that <select><match test=A><match
>>test=B></select> implies that if A matches, and B matches, two clauses occur,
>>but <select><when test=A><when test=B></select> implies that if A matches, B
>>won't be tested.  Is this FS or useful power?  
>>
> 
> I think you touched the nerve of the discussion:
> 
>  - matchers distill the behavior of boolean 'or'
>  - selectors distill the behavior of boolean 'and'
> 
> Berin suggested that URI-matching performed with 'and'-type boolean
> logic is instrinsically faster since it is algorithmically less complex.
> 
> I think he does have a point here!



I think that my point also includes the fact that URI is the primary match
for any site.  When we go to a site, we expect that a specific URI returns
a specific resource.  This is a contract we never want to break.

However, if I request "http://jarkarta.apache.org/developing-with-avalon";
the resource can be one of a number of different representations of the
same resource.  For instance, if Adobe Acrobat Reader had a browser in it,
Cocoon could detect it as the browser and return the PDF view of the
resource.  If I have Mozilla 0.9.7, I could get my markup optimized for
spec compliant browsers.  The basic underlying contract is that the URI
directly relates to a resource.

In that case, using a simple HashMap to match unrolled URIs to pipelines
that have only one resource per pipeline we effectively speed our access
to the desired resource.

Currently the critical path to access the pipelines is way too long.  A
while ago, I posted a sequence diagram that shows just how complex a single
request is.  It has only gotten more complex.  The link to the diagram is
http://jakarta.apache.org/~bloritsch/CocoonServlet_service.png and is
quite large.

Getting back to the point of direct URI to pipeline matching:  if we require
any part of the resource to be dynamic (such as representation), we can
perform the selection of transformers based on other request aspects such
as client.  What will not change for the pipeline is the generator.


>                                    - o -
> 
> The goals of the indirect sitemap components (those who don't generate
> data that goes directly inside the SAX stream) are to provide support
> for the other direct sitemap components (those who do).
> 
> Right at the beginning of the sitemap design, it was clear that the
> functionality of 'routing' the request thru the different components was
> absolutely required in order to allow Cocoon2's sitemap to match
> Cocoon1's reactor in functionality. Otherwise, forking friction would
> have been developped.
> 
> At the same time, the reactor (which appeared as a very good idea when I
> proposed it) is sort of a dynamic-pipeline composer. We knew that
> dynamic pipelines provided several limitations:
> 
>  - where impossible to 'validate' at load-time
>  - where impossible to 'pre-compile' and optimize
>  - where harder to pool
>  - where less cache friendly
> 
> and more important:
> 
>  - where much more painful to debug!


We have only marginally addressed these issues in our implementation.
However, we are closer to where we belong.  Most of the resources
I deal with (i.e. URI targets) are static pipelines--even though the
content is somewhat dynamic (tied to database).  This will be the case
about 70% of the time or more.  However, themed sites do have a
dynamic pipeline.  Usually the point of dynacism in the pipeline is
directly related to one session attribute.

We also have to come to grips with what appears to be a dynamic pipeline,
but is really only a choice between two static pipelines.  This is the
authentication and authorization issue.  In this case, the priority of
the request/session attributes comes above the URI.



> Just like a real pipeline requires 'valves' to route its flow, we
> decided to introduce 'support components' that could do the same and
> decided to make them 'components' to allow the 'routing logic' to be
> pluggable.


:/

The more I look at it, the concept of matcher/selector is merely program
flow directives.  Granting them full Component status seems wasteful.
In fact, such flow control devices are bound to the particular Sitemap
implementation.  For instance, the difference between a RegexpURIMatcher
and the WildcardURIMatcher is the pattern used for matching.  In fact,
you could have one matcher that performs the test with a "protocol"
to select the pattern type.  For example:
"re:store-([0-9]*)/section-([0-9]*)/index.html" and
"wildcard:store-*/section-*/index.html" are roughly equivalent matches
(the regular expression is more specific).

The idea of the matcher and selector is best applied by unrolling the URIs.
The process of unrolling a URI can be demand based, or static.  If it is
demand based, the URI is processed through the matchers as it is received,
and it is added to the "good" map or the "bad" map.  If unrolling is statically
done, the Sitemap expands the matches and examines the harddrive for the
files that the match applies to.

More a bit later.




> In fact, like Judson brilliantly suggested above, selectors perform real
> 'routing' (choose one route into n) while matchers perform conditional
> deviations that go back to the main flow.
> 
> Using tubes and valves that would yield:
> 
>                         +->[ ]--+
>                         |       |
>                         |       |
>  matchers            -->*-------+--->
> 
> 
>                           ->[ ]-->
>                          /
>  selectors           -->*-->[ ]-->
>                          \
>                           ->[ ]-->
> 
> [having studied opto-electronics I've seen all sort of things with
> optical pipelines, ie. fibers]
> 
> As you can evidently see, topologically, the two are *not* equivalent:
> you can't use one to make the other.
> 
> -------------
> Question: can one component perform both?
> -------------


It is not an easy thing to decide.  For instance, the Action and the Matcher
could both be merged quite easily as they have similar semantics, though
differing focus.  Consider the API for an Action:

interface Action
{
     Map act(Redirector rd, Resolver rs, Map objectMap, String source, Parameters p);
}

It's at least close.  A matcher is basically a simplified version of this.

In essence the redirector and resolver are removed from a Matcher, and a matcher
has the ability to precompile the pattern.

Both of them return a Map to define substitution variables.


Contrast this with a Selector which returns a boolean.



> the good-old if/else semantics does matchers
> 
>  <if test="">
>   ...
>  </if>
> 
> as well as selectors
> 
>  <if test="">
>   ...
>   <else>
>     <if test="">
>      ...
>     </if>
>   </else>
>  </if>


Yuck.  I think this why we had the Matcher/Selector idea to begin with.



> ----------
> Question: *should* they be merged?
> ----------
> 
> Well, this is the key question.
> 
> The above syntax is too verbose and ugly for my personal taste.
> Moreover, the <if> tag provides a strong negative appeal for people
> approaching an XML markup language.


I agree.




> ------------
> question: should selectors return maps of tokens?
> ------------
> 
> In that case, we achieve two things:
> 
> 1) the interfaces Matcher and Selector are unified, thus removing the
> perception that they are just doing the same thing. In fact, from the
> topological views, the * (star) performs the routing process and it
> doesn't make any sense to have two different ones. Expecially if they
> end up being the cut/paste clone of one another.
> 
> 2) and-type boolean processing can be performed even for URI-matching,
> thus providing an algorithmical way to increase sitemap interpretation.
> 
> 3) back-compatibility at the sitemap level can achieved
> 
> 4) we don't use ugly markup code.


The returned map should have a static lookup object, that way unrolled
selectors would store the possible answers in a lookup with a callback
for the logic that is performed for that case.  For instance:

Map results = selector.select( "test", objectMap, params );
Command callback = (Command) commandMap.get( results.get( Selector.RETURN_OBJECT ) );
callback.execute( results, objectMap, params, pipeline );

It simplifies alot of implementation details, and the sitemap semantics
for the improved selector would look like this:

<selector type="regexp-uri">
   <case test="store-([0-9]*)\/section-([0-9]*)\/index\.xhtml">
     <generator uri="docs/store-{1}/section{2}/index.html" />
     <transformer/>
     <serializer/>
   </case>
   <case test="store-([0-9]*)\/compliance\.pdf">
     <generator uri="docs/compliance-report.xsp">
       <parameter name="storenbr" value="{1}"/>
     </generator>
     <transformer/>
     <serializer/>
   </case>
</selector>

This I think is not only more compact than several matchers, but
the connotation is that every test is mutually exclusive.

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Matchers vs. Selectors [was Re: Retuning Sitemap Design]

Reply via email to