Berin Loritsch wrote: > > Stefano Mazzocchi wrote: > > > Judson Lester wrote: > > > > <snip/> > > > >>The clearest dilemma I can see is that <select><match test=A><match > >>test=B></select> implies that if A matches, and B matches, two clauses occur, > >>but <select><when test=A><when test=B></select> implies that if A matches, B > >>won't be tested. Is this FS or useful power? > >> > > > > I think you touched the nerve of the discussion: > > > > - matchers distill the behavior of boolean 'or' > > - selectors distill the behavior of boolean 'and' > > > > Berin suggested that URI-matching performed with 'and'-type boolean > > logic is instrinsically faster since it is algorithmically less complex. > > > > I think he does have a point here! > > I think that my point also includes the fact that URI is the primary match > for any site. When we go to a site, we expect that a specific URI returns > a specific resource. This is a contract we never want to break.
absolutely. > However, if I request "http://jarkarta.apache.org/developing-with-avalon" > the resource can be one of a number of different representations of the > same resource. For instance, if Adobe Acrobat Reader had a browser in it, > Cocoon could detect it as the browser and return the PDF view of the > resource. If I have Mozilla 0.9.7, I could get my markup optimized for > spec compliant browsers. The basic underlying contract is that the URI > directly relates to a resource. as the name implies. > In that case, using a simple HashMap to match unrolled URIs to pipelines > that have only one resource per pipeline we effectively speed our access > to the desired resource. The problem is that if you require a pipeline per resource, your administration costs grow linear with the URI-space size. and we *DO*NOT* want this, no matter what speed improvement you get. [btw, I think it's possible to perform time-constant hash-based jumps even on regexps, but didn't think about the algorithm yet] > Currently the critical path to access the pipelines is way too long. A > while ago, I posted a sequence diagram that shows just how complex a single > request is. It has only gotten more complex. The link to the diagram is > http://jakarta.apache.org/~bloritsch/CocoonServlet_service.png and is > quite large. Yes, I was perfectly aware of the fact that the intrinsic linear-flow of the sitemap could potentially be an architectural bottleneck, this is why we introduced wildcard based matching to reduce the pipeline complexity. <snip/> > > Just like a real pipeline requires 'valves' to route its flow, we > > decided to introduce 'support components' that could do the same and > > decided to make them 'components' to allow the 'routing logic' to be > > pluggable. > > :/ > > The more I look at it, the concept of matcher/selector is merely program > flow directives. Granting them full Component status seems wasteful. I disagree. We just barely started to scratch the surface of what matchers can do. I'm sure people will find very creative ways to add new matching logics (CC/PP for example, or load, or time of the day to perform color-shading sunset-like stylesheets, you name it) The current problem is that Cocoon doesn't allow you to drop your Cocoon Web Applications on it just like we deploy WARs on Tomcat. *this* is the key. Once we have that, people will find very creative ways of using these conditional components and will love the ability to reuse them all over their stuff with a simple line added to their sitemap. > In fact, such flow control devices are bound to the particular Sitemap > implementation. See above: the problem is not in the sitemap design, but in Cocoon's lack of easy deploying of web applications on top. (which is something I want to fix ASAP!) > For instance, the difference between a RegexpURIMatcher > and the WildcardURIMatcher is the pattern used for matching. In fact, > you could have one matcher that performs the test with a "protocol" > to select the pattern type. For example: > "re:store-([0-9]*)/section-([0-9]*)/index.html" and > "wildcard:store-*/section-*/index.html" are roughly equivalent matches > (the regular expression is more specific). Yes, but I think it's useful syntax sugar since <match type="regexp" pattern="..."> is much more readable and understandable than <match pattern="re:..."> > The idea of the matcher and selector is best applied by unrolling the URIs. > The process of unrolling a URI can be demand based, or static. If it is > demand based, the URI is processed through the matchers as it is received, > and it is added to the "good" map or the "bad" map. If unrolling is statically > done, the Sitemap expands the matches and examines the harddrive for the > files that the match applies to. > > More a bit later. > > > In fact, like Judson brilliantly suggested above, selectors perform real > > 'routing' (choose one route into n) while matchers perform conditional > > deviations that go back to the main flow. > > > > Using tubes and valves that would yield: > > > > +->[ ]--+ > > | | > > | | > > matchers -->*-------+---> > > > > > > ->[ ]--> > > / > > selectors -->*-->[ ]--> > > \ > > ->[ ]--> > > > > [having studied opto-electronics I've seen all sort of things with > > optical pipelines, ie. fibers] > > > > As you can evidently see, topologically, the two are *not* equivalent: > > you can't use one to make the other. > > > > ------------- > > Question: can one component perform both? > > ------------- > > It is not an easy thing to decide. For instance, the Action and the Matcher > could both be merged quite easily as they have similar semantics, though > differing focus. Consider the API for an Action: > > interface Action > { > Map act(Redirector rd, Resolver rs, Map objectMap, String source, Parameters p); > } > > It's at least close. A matcher is basically a simplified version of this. Yes, good point. > In essence the redirector and resolver are removed from a Matcher, and a matcher > has the ability to precompile the pattern. > > Both of them return a Map to define substitution variables. > > Contrast this with a Selector which returns a boolean. Hmmm, what do other think of this? > > the good-old if/else semantics does matchers > > > > <if test=""> > > ... > > </if> > > > > as well as selectors > > > > <if test=""> > > ... > > <else> > > <if test=""> > > ... > > </if> > > </else> > > </if> > > Yuck. I think this why we had the Matcher/Selector idea to begin with. Exactly! > > ---------- > > Question: *should* they be merged? > > ---------- > > > > Well, this is the key question. > > > > The above syntax is too verbose and ugly for my personal taste. > > Moreover, the <if> tag provides a strong negative appeal for people > > approaching an XML markup language. > > I agree. > > > ------------ > > question: should selectors return maps of tokens? > > ------------ > > > > In that case, we achieve two things: > > > > 1) the interfaces Matcher and Selector are unified, thus removing the > > perception that they are just doing the same thing. In fact, from the > > topological views, the * (star) performs the routing process and it > > doesn't make any sense to have two different ones. Expecially if they > > end up being the cut/paste clone of one another. > > > > 2) and-type boolean processing can be performed even for URI-matching, > > thus providing an algorithmical way to increase sitemap interpretation. > > > > 3) back-compatibility at the sitemap level can achieved > > > > 4) we don't use ugly markup code. > > The returned map should have a static lookup object, that way unrolled > selectors would store the possible answers in a lookup with a callback > for the logic that is performed for that case. For instance: > > Map results = selector.select( "test", objectMap, params ); > Command callback = (Command) commandMap.get( results.get( Selector.RETURN_OBJECT ) ); > callback.execute( results, objectMap, params, pipeline ); > > It simplifies alot of implementation details, and the sitemap semantics > for the improved selector would look like this: > > <selector type="regexp-uri"> > <case test="store-([0-9]*)\/section-([0-9]*)\/index\.xhtml"> > <generator uri="docs/store-{1}/section{2}/index.html" /> > <transformer/> > <serializer/> > </case> > <case test="store-([0-9]*)\/compliance\.pdf"> > <generator uri="docs/compliance-report.xsp"> > <parameter name="storenbr" value="{1}"/> > </generator> > <transformer/> > <serializer/> > </case> > </selector> > > This I think is not only more compact than several matchers, but > the connotation is that every test is mutually exclusive. I agree with you on the fact that sometimes the 'fall-thru' behavior of matchers is misleading. And I agree with you that even selectors should provide the ability to return tokens. So, for example (keeping syntax back compatible) <select type="uri"> <when test="store-([0-9]*)\/section-([0-9]*)\/index\.xhtml" type="regexp> <generator src="docs/store-{1}/section{2}/index.html" /> <transformer/> <serializer/> </when> <when test="store-*/compliance.pdf" type="wildcard"> <generator src="docs/compliance-report.xsp"> <parameter name="storenbr" value="{1}"/> </generator> <transformer/> <serializer/> </when> <otherwise> <generator src="{1}"> // what does the 'otherwise' part return? <transformer/> <serializer/> </otherwise> </select> but the question becomes: if we can't implement hash-based jumping based on regexps, the above has no effect on performance since you have to perform the regexp matching sequentially anyway, and hitting a serializer has the effect of exiting the pipeline anyway, even on matchers. So, really, what benefits are we getting? -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]