Matchers vs. Selectors [was Re: Retuning Sitemap Design]

Stefano Mazzocchi Fri, 11 Jan 2002 12:03:05 -0800

Judson Lester wrote:

<snip/>


> The clearest dilemma I can see is that <select><match test=A><match
> test=B></select> implies that if A matches, and B matches, two clauses occur,
> but <select><when test=A><when test=B></select> implies that if A matches, B
> won't be tested.  Is this FS or useful power?  

I think you touched the nerve of the discussion:

 - matchers distill the behavior of boolean 'or'
 - selectors distill the behavior of boolean 'and'

Berin suggested that URI-matching performed with 'and'-type boolean
logic is instrinsically faster since it is algorithmically less complex.

I think he does have a point here!

<snip/>

> > Moreover, matchers 'match a request' while selectors 'select paths'.
> > They clearly separate concerns! Their functionality and implementations
> > partially overlap because both require 'conditional operations', but one
> > is declarative (matchers) and one is procedural (selectors).
> 
> Again, this may be a FS, but I'm dubious about how 'natural' a seperation
> those concerns are.  Matchers match anything, and on what other basis do
> selectors 'select paths' that on the request?  I'm concerned about pipelines
> becoming unnecesarily convoluted because a single concern has been seperated.

Good concern.

And given the fact that this confrontation between matchers and
selectors seems to appear once and a while, I think we have some design
work to do.

Ok, let's do it.

                                   - o -

The goals of the indirect sitemap components (those who don't generate
data that goes directly inside the SAX stream) are to provide support
for the other direct sitemap components (those who do).

Right at the beginning of the sitemap design, it was clear that the
functionality of 'routing' the request thru the different components was
absolutely required in order to allow Cocoon2's sitemap to match
Cocoon1's reactor in functionality. Otherwise, forking friction would
have been developped.

At the same time, the reactor (which appeared as a very good idea when I
proposed it) is sort of a dynamic-pipeline composer. We knew that
dynamic pipelines provided several limitations:

 - where impossible to 'validate' at load-time
 - where impossible to 'pre-compile' and optimize
 - where harder to pool
 - where less cache friendly

and more important:

 - where much more painful to debug!

Just like a real pipeline requires 'valves' to route its flow, we
decided to introduce 'support components' that could do the same and
decided to make them 'components' to allow the 'routing logic' to be
pluggable.

One first objection Berin posed is that being web browsers driven by
URIs, 'uri-based routing' is the most used logic and should be
explicitly expressed in the sitemap design, just like we do for
aggregation.

---------------------------------------------------
Question: should URI-based routing logic have a more explict place in
the sitemap?
---------------------------------------------------

I think it's a very good question, even if I disagree on the intentions
to ease URI-space administration for system administrators.

At the same time (and this was the reason *not* to do it since the
beginning!), the fact that URI-based routing is not more important than
any other routing logic suggests that other routing logics can be
implemented and used with the same ease!

Look at Apache: URI-based routing is piece of cake (well, not really,
but easy enough, just add your 'Alias' and you're done) while
Non-URI-based routing is DEADLY HARD! or at the very least, is perceived
as such since mod_rewrite is black art! (and deadly plagued with
unbalanced FS!)

The cocoon sitemap was *designed* to remove that unbalanced perception
and make it possible for people to start designing their request space
*without* starting from the URI-space.

I still consider it valuable to provide this 'routing-logic equivalence'
right from the sitemap semantics.

Berin also suggested that placing this URI-based logic into the sitemap
should allow for performance improvements. While I think this is mostly
an implementation detail, I think he is right noting that matchers and
selectors have different algorithmical behaviors and sometimes,
URI-based routing should use selectors to improve efficiency.

In fact, like Judson brilliantly suggested above, selectors perform real
'routing' (choose one route into n) while matchers perform conditional
deviations that go back to the main flow.

Using tubes and valves that would yield:

                        +->[ ]--+
                        |       |
                        |       |
 matchers            -->*-------+--->


                          ->[ ]-->
                         /
 selectors           -->*-->[ ]-->
                         \
                          ->[ ]-->

[having studied opto-electronics I've seen all sort of things with
optical pipelines, ie. fibers]

As you can evidently see, topologically, the two are *not* equivalent:
you can't use one to make the other.

-------------
Question: can one component perform both?
-------------

the good-old if/else semantics does matchers

 <if test="">
  ...
 </if>

as well as selectors

 <if test="">
  ...
  <else>
    <if test="">
     ...
    </if>
  </else>
 </if>

----------
Question: *should* they be merged?
----------

Well, this is the key question.

The above syntax is too verbose and ugly for my personal taste.
Moreover, the <if> tag provides a strong negative appeal for people
approaching an XML markup language.

Let's list the differences:

 matchers:
   - or-like behavior
   - return map of tokens
   - don't provide a fall-back mechanism

 selectors:
   - and-like behavior
   - do not return anything
   - provide a fall-back mechanism

Let's now outline the reasons for this differences:

 1) matchers don't provide a fall-back mechanism because they can't:
every matcher is executed anyway.

 2) if selectors and matchers can be mixed with no restriction, there is
no need to have a single component that performs both or and and-type
boolean action.

IMO, the real key question becomes

------------
question: should selectors return maps of tokens?
------------

In that case, we achieve two things:

1) the interfaces Matcher and Selector are unified, thus removing the
perception that they are just doing the same thing. In fact, from the
topological views, the * (star) performs the routing process and it
doesn't make any sense to have two different ones. Expecially if they
end up being the cut/paste clone of one another.

2) and-type boolean processing can be performed even for URI-matching,
thus providing an algorithmical way to increase sitemap interpretation.

3) back-compatibility at the sitemap level can achieved

4) we don't use ugly markup code.

Comments?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Matchers vs. Selectors [was Re: Retuning Sitemap Design]

Reply via email to