Re: Matchers vs. Selectors [was Re: Retuning Sitemap Design]

Stefano Mazzocchi Sat, 12 Jan 2002 05:50:20 -0800

Berin Loritsch wrote:
> 
> Stefano Mazzocchi wrote:
> 
> > Judson Lester wrote:
> >
> > <snip/>
> >
> >>The clearest dilemma I can see is that <select><match test=A><match
> >>test=B></select> implies that if A matches, and B matches, two clauses occur,
> >>but <select><when test=A><when test=B></select> implies that if A matches, B
> >>won't be tested.  Is this FS or useful power?
> >>
> >
> > I think you touched the nerve of the discussion:
> >
> >  - matchers distill the behavior of boolean 'or'
> >  - selectors distill the behavior of boolean 'and'
> >
> > Berin suggested that URI-matching performed with 'and'-type boolean
> > logic is instrinsically faster since it is algorithmically less complex.
> >
> > I think he does have a point here!
> 
> I think that my point also includes the fact that URI is the primary match
> for any site.  When we go to a site, we expect that a specific URI returns
> a specific resource.  This is a contract we never want to break.


absolutely.

> However, if I request "http://jarkarta.apache.org/developing-with-avalon";
> the resource can be one of a number of different representations of the
> same resource.  For instance, if Adobe Acrobat Reader had a browser in it,
> Cocoon could detect it as the browser and return the PDF view of the
> resource.  If I have Mozilla 0.9.7, I could get my markup optimized for
> spec compliant browsers.  The basic underlying contract is that the URI
> directly relates to a resource.

as the name implies.

> In that case, using a simple HashMap to match unrolled URIs to pipelines
> that have only one resource per pipeline we effectively speed our access
> to the desired resource.

The problem is that if you require a pipeline per resource, your
administration costs grow linear with the URI-space size. and we
*DO*NOT* want this, no matter what speed improvement you get.

[btw, I think it's possible to perform time-constant hash-based jumps
even on regexps, but didn't think about the algorithm yet]

> Currently the critical path to access the pipelines is way too long.  A
> while ago, I posted a sequence diagram that shows just how complex a single
> request is.  It has only gotten more complex.  The link to the diagram is
> http://jakarta.apache.org/~bloritsch/CocoonServlet_service.png and is
> quite large.

Yes, I was perfectly aware of the fact that the intrinsic linear-flow of
the sitemap could potentially be an architectural bottleneck, this is
why we introduced wildcard based matching to reduce the pipeline
complexity.
 
<snip/>

> > Just like a real pipeline requires 'valves' to route its flow, we
> > decided to introduce 'support components' that could do the same and
> > decided to make them 'components' to allow the 'routing logic' to be
> > pluggable.
> 
> :/
> 
> The more I look at it, the concept of matcher/selector is merely program
> flow directives.  Granting them full Component status seems wasteful.

I disagree. We just barely started to scratch the surface of what
matchers can do. I'm sure people will find very creative ways to add new
matching logics (CC/PP for example, or load, or time of the day to
perform color-shading sunset-like stylesheets, you name it)

The current problem is that Cocoon doesn't allow you to drop your Cocoon
Web Applications on it just like we deploy WARs on Tomcat.

*this* is the key. Once we have that, people will find very creative
ways of using these conditional components and will love the ability to
reuse them all over their stuff with a simple line added to their
sitemap.

> In fact, such flow control devices are bound to the particular Sitemap
> implementation.  

See above: the problem is not in the sitemap design, but in Cocoon's
lack of easy deploying of web applications on top. (which is something I
want to fix ASAP!)

> For instance, the difference between a RegexpURIMatcher
> and the WildcardURIMatcher is the pattern used for matching.  In fact,
> you could have one matcher that performs the test with a "protocol"
> to select the pattern type.  For example:
> "re:store-([0-9]*)/section-([0-9]*)/index.html" and
> "wildcard:store-*/section-*/index.html" are roughly equivalent matches
> (the regular expression is more specific).

Yes, but I think it's useful syntax sugar since

 <match type="regexp" pattern="...">

is much more readable and understandable than 

 <match pattern="re:...">
 
> The idea of the matcher and selector is best applied by unrolling the URIs.
> The process of unrolling a URI can be demand based, or static.  If it is
> demand based, the URI is processed through the matchers as it is received,
> and it is added to the "good" map or the "bad" map.  If unrolling is statically
> done, the Sitemap expands the matches and examines the harddrive for the
> files that the match applies to.
> 
> More a bit later.
> 
> > In fact, like Judson brilliantly suggested above, selectors perform real
> > 'routing' (choose one route into n) while matchers perform conditional
> > deviations that go back to the main flow.
> >
> > Using tubes and valves that would yield:
> >
> >                         +->[ ]--+
> >                         |       |
> >                         |       |
> >  matchers            -->*-------+--->
> >
> >
> >                           ->[ ]-->
> >                          /
> >  selectors           -->*-->[ ]-->
> >                          \
> >                           ->[ ]-->
> >
> > [having studied opto-electronics I've seen all sort of things with
> > optical pipelines, ie. fibers]
> >
> > As you can evidently see, topologically, the two are *not* equivalent:
> > you can't use one to make the other.
> >
> > -------------
> > Question: can one component perform both?
> > -------------
> 
> It is not an easy thing to decide.  For instance, the Action and the Matcher
> could both be merged quite easily as they have similar semantics, though
> differing focus.  Consider the API for an Action:
> 
> interface Action
> {
>      Map act(Redirector rd, Resolver rs, Map objectMap, String source, Parameters p);
> }
> 
> It's at least close.  A matcher is basically a simplified version of this.

Yes, good point.

> In essence the redirector and resolver are removed from a Matcher, and a matcher
> has the ability to precompile the pattern.
> 
> Both of them return a Map to define substitution variables.
> 
> Contrast this with a Selector which returns a boolean.

Hmmm, what do other think of this?
 
> > the good-old if/else semantics does matchers
> >
> >  <if test="">
> >   ...
> >  </if>
> >
> > as well as selectors
> >
> >  <if test="">
> >   ...
> >   <else>
> >     <if test="">
> >      ...
> >     </if>
> >   </else>
> >  </if>
> 
> Yuck.  I think this why we had the Matcher/Selector idea to begin with.

Exactly!

> > ----------
> > Question: *should* they be merged?
> > ----------
> >
> > Well, this is the key question.
> >
> > The above syntax is too verbose and ugly for my personal taste.
> > Moreover, the <if> tag provides a strong negative appeal for people
> > approaching an XML markup language.
> 
> I agree.
> 
> > ------------
> > question: should selectors return maps of tokens?
> > ------------
> >
> > In that case, we achieve two things:
> >
> > 1) the interfaces Matcher and Selector are unified, thus removing the
> > perception that they are just doing the same thing. In fact, from the
> > topological views, the * (star) performs the routing process and it
> > doesn't make any sense to have two different ones. Expecially if they
> > end up being the cut/paste clone of one another.
> >
> > 2) and-type boolean processing can be performed even for URI-matching,
> > thus providing an algorithmical way to increase sitemap interpretation.
> >
> > 3) back-compatibility at the sitemap level can achieved
> >
> > 4) we don't use ugly markup code.
> 
> The returned map should have a static lookup object, that way unrolled
> selectors would store the possible answers in a lookup with a callback
> for the logic that is performed for that case.  For instance:
> 
> Map results = selector.select( "test", objectMap, params );
> Command callback = (Command) commandMap.get( results.get( Selector.RETURN_OBJECT ) );
> callback.execute( results, objectMap, params, pipeline );
> 
> It simplifies alot of implementation details, and the sitemap semantics
> for the improved selector would look like this:
> 
> <selector type="regexp-uri">
>    <case test="store-([0-9]*)\/section-([0-9]*)\/index\.xhtml">
>      <generator uri="docs/store-{1}/section{2}/index.html" />
>      <transformer/>
>      <serializer/>
>    </case>
>    <case test="store-([0-9]*)\/compliance\.pdf">
>      <generator uri="docs/compliance-report.xsp">
>        <parameter name="storenbr" value="{1}"/>
>      </generator>
>      <transformer/>
>      <serializer/>
>    </case>
> </selector>
> 
> This I think is not only more compact than several matchers, but
> the connotation is that every test is mutually exclusive.

I agree with you on the fact that sometimes the 'fall-thru' behavior of
matchers is misleading.

And I agree with you that even selectors should provide the ability to
return tokens.

So, for example (keeping syntax back compatible)

 <select type="uri">
    <when test="store-([0-9]*)\/section-([0-9]*)\/index\.xhtml"
type="regexp>
      <generator src="docs/store-{1}/section{2}/index.html" />
      <transformer/>
      <serializer/>
    </when>
    <when test="store-*/compliance.pdf" type="wildcard">
      <generator src="docs/compliance-report.xsp">
        <parameter name="storenbr" value="{1}"/>
      </generator>
      <transformer/>
      <serializer/>
    </when>
    <otherwise>
      <generator src="{1}">  // what does the 'otherwise' part return?
      <transformer/>
      <serializer/>
    </otherwise>
 </select>

but the question becomes:

if we can't implement hash-based jumping based on regexps, the above has
no effect on performance since you have to perform the regexp matching
sequentially anyway, and hitting a serializer has the effect of exiting
the pipeline anyway, even on matchers.

So, really, what benefits are we getting?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Matchers vs. Selectors [was Re: Retuning Sitemap Design]

Reply via email to