Re: [RT] Implementation of VPCs and "multi-relative" source resolving (long)

Daniel Fagerstrom Mon, 10 Jan 2005 14:58:34 -0800

A somewhat late answer ;) But I have spent some more thinking on this during the discussion about exporting flowscript functions from blocks.

Sylvains original mail can be found in http://marc.theaimsgroup.com/?t=110064560900003&r=1&w=2.

What I will discuss is rather subtle and involved stuff, but discussing the gory details is IMO important for getting a robust implementation of VPCs that behave in an "expected" way.

Before discussing source resolving in VPC I would like to remind about the concepts of static and dynamic binding from program language theory that IMO gives some further insights to the situation.

Static and Dynamic Variable Binding
===================================

Lets look at an example program:

var x=1;

function a() {
 var y=2;
 return function() {var z=3; return x+y+z;}
}

function b() {
 var y=4;
 var c=a();
 return c();
}

Here the question is: what should b() return?

In x+y+z it is rather obvious that x should be bound to the globaly defined value 1 and z to the localy defined value 3, illustrating that local and global variables has a rather obvious interpretation. But what about the non local variable y? Here there are two reasonable alternatives:

Static binding: variables are bound in the context where they are defined, here y=2, and b()==6. Dynamic binding: variables are bound in the context where they are executed, here y=4, and b()==8.

In early implementations of Lisp, dynamic binding was used, but that gives poor isolation as you must take the context where the function is executed in account to understand what it does, rather than just looking at the definition. Since then it is generally accepted that static binding is better than dynamic binding for functions.

In object oriented languages member variables used in member functions are statically bound, (which is a necesity for getting object orientation). But on the other when we write a class B that extends another class A we can see that the member functions are dynamically bound. I.e. if we call a function in A that use a function that is defined in booth A and B the later will be used, (this is of course somewhat more complicated than pure dynamic binding as we have fallback to functions higher up in the class hierarchy).

So we can see that static binding is good when you _use_ a function from somewhere else while dynamic binding is good for _extending_ something.

VPC Source Resolving
====================

So what does this have to do with source resolving in VPCs? Well using a VPC is like calling a function or a procedure, resolving an absolute URI is like dereferencing a global variable and resolving a relative URI is like dereferencing a non local variable, so what binding strategy should we use for relative URIs in VPCs?

Sylvain Wallez wrote:
<snip/>

The problem with source resolving is that the base URI used to resolve relative URIs changes when we enter a subsitemap: relative sources are relative to the directory containing the "current" sitemap.

That means that the base URI used to resolve e.g. the "src" attribute of a <map:generate> is the one of the sitemap containing that statement, and not the sitemap where the component was declared, which can be a parent sitemap of the current one.

This isn't a problem with URIs part of a statement ("src" attribute and <map:parameter>) but is a real problem for URIs part of the component configuration. That's what happens with the I18nTransformer as catalogue locations are URIs defined in the component declaration, thus relative to the sitemap where the component is _declared_. Unfortunately, they are resolved relatively to where the component is first _instanciated_, which can occur randomly in any of the current sitemap and its child sitemaps, depending on how pools are managed. The practical result is that we cannot reliably declare an i18n transformer for use by a tree of subsitemaps.

Now that we have a per-sitemap Avalon Context, we can also store in that context the base URI of the sitemap declaring the component. The i18n transformer just has to use that base URI to access the catalogues defined in its configuration.

That's what I called "multi-relative" source resolving in the subject of this post: URIs coming from a component configurations will have to be resolved relatively to the base URI contained in the Avalon context, whereas URIs coming from sitemap statements are resolved using the relative URI of the sitemap that is currently executing.

Expressed in the above terminology we could say that components today use a dynamic binding strategy of relative URIs which creates unexpected and unwanted behaviour. Sylvain describes a mechanism for using static binding instead. Excelent IMO.

Still following? Now let's see source resolving in VPCs...
                         --- oOo ---
With VPCs, the problem is worse than with regular components, as VPCs are components defined by sitemap snippets with their "src" and <map:parameter>. So what does "relative" means in this context? Is it relative to the calling sitemap or relative to the sitemap that defines the VPC? The result is "it depends"!

It depends on whether the URI is passed from the calling environment (it's then relative to the calling sitemap) or is some local data used by the VPC implementation such as an XSLT (it's then relative to the sitemap defining the VPC).

So how do we distinguish them? A solution was proposed [1] where we added some typing information to the sitemap statements calling the VPC, so that URIs could be absolutized before the actual call.

That is actually wrong, as it forces the user of a component to explicitely indicate that some particular action should be taken on a parameter, whereas this information is related to the implementation of the component. Furthermore, forgetting to specify that absolutization has to be performed can lead to weird behaviours difficult to debug.

So, it's the VPC's responsibility to make explicit in its definition what values coming from the caller have to be absolutized relatively to the calling sitemap.

For this, I propose that VPC definitions have additional statements defining what parameters have to be absolutized, e.g.:
<map:generator name="foo">
 <map:absolutize param="src"/>
 <map:absolutize param="bar"/>
 <map:generate type="file" src="{src}">
   <map:parameter name="baz" value="bar"/>
 </map:parameter>
 <map:transform src="data/{skin}.xslt/>
</map:generator>
The input parameters "src" (actually the "src" attribute in the calling statement) and "bar" are first absolutized relatively to the calling sitemap, and then the base URI of the sitemap defining the VPC becomes the new relative context, used e.g. to resolve "data/{skin}.xslt".

That way, we can also implement multi-relative source resolving in sitemap statements.

Also here I agree with the analysis of the situation, relative URIs within the VPC should be resolved relative to the sitemap (block), they are defined in, i.e. static binding. URIs as parameters to VPCs should be resolved relative to the calling sitemap (block). However there are some subtilities in the parameter passing that makes me suggest a somewhat different implementation.

I will describe things as each block or sitemap has its own source resolver, that knows how to resolve relative URIs in that context (and that also have access to all public resources with absolute URIs). I find it easier to describe the behaviour in such terms. Whether that is a good implemetation strategy or not is another question.

Now, the "ideal" solution would IMO be that the VPC declare its URI input parameters as Source and that the framework resolves the input URIs with the callers source resolver. That would give complete isolation between the source resolver of the caller and the source resolver of the block. This solution is in practice not possible as the SitemapModelComponent interface (that is used by most sitemap components) take a String and SourceResolver as arguments rather than a Source. And changing the SitemapModelComponent interface would suddenly make the Avalon change to Servicable seem like a relatively popular decision ;) So we have no other choice than giving the pipeline components in the VPC, URI strings as arguments.

Sylvain solves this problem by transforming relative input URIs to absolute URIs (relative to the callers context). This absolute URI can the be resolved by the VPCs resolver. This is a good solution that gives the correct semantics IMO. But it imposes some restrictions on what we can do.

Say that we have a block B that want to apply a VPC from block A on some of its files or some of its internal pipelines. This is certainly a relevant and usable thing to be able to do. But this creates problems, as neither the files nor the internal pipelines are reachable from the global context.

One way to solve this would be make all resources reachable from the global context, but I find that very unatractive as that takes away the isolation between blocks that IMO is one of the most important reasons for introducing them. Another possibility is to require a block to make all resources that it want to use in external VPCs available through its sitemap. But that also breaks isolation. Still another possiblity is to send the resolver of the calling component to the called one, but thats is even worse as it both means that a component must make its internals available to all components that it want to use and furthermore, the called component must know when it should use its own and when it should use its callers source resolver.

I haven't found any simple and elegant solution to this problem, but at least I think that I have a possible solution:

* A source parameter in the VPC is declared as a Source and resolved to a Source by the framework, that uses the callers source resolver. This is like in the "ideal" case described above. * The resolved Source object is put in some temporary place where it can be reached by a special protocol, "param:arg1" say. * Then this URI, "param:arg1", is used as parameter to the internal components in the VPC.

More complicated than I would have liked, but AFAICS it should solve the problems that I outlined above.

                        --- oOo ---

Ok, isn't this overly paranoid, do we need this level of isolation?

IMO we need that. One of the reasons that OO have become so popular and successfull for building large systems is that it provides mechanism for isolating components. Without isolation you must check _all_ code in _all_ components if something within a component was changed in an unexpected way.

Also, even if we probably not are going to have class loader isolation in the first version of blocks, we should at least design for isolation to the best of our knowledge.

We may actually want to go a bit further by allowing any computation to provide input parameters using input modules, e.g. <map:generator name="foo"> <map:parameter name="src" value="{absolutize:{src}}"/> ...

I prefer explicit decaration of all input parameters so that one can see the contract without needing to browse the VPCs implementation.

But the source-resolving problem is not finished...
                         --- oOo ---
The last source-resolving problem is related to URIs that may be present in the SAX stream, e.g. XInclude URIs. What are they relative to?

My feeling here is that we need to distinguish for a single VPC the base URI used to resolve URIs within the setup phase (i.e. "src" and <map:parameter>) and the base URI used to resolve URIs during the processing phase.

That could be achieved using an additional attribute on the component declaration, i.e. in the above example something like

<map:generator name="foo" stream-uris-base="local|caller">

First I wouldn't like a VPC to be able to resolve URIs in its callers context. This is based on my opinions about isolation discussed above. If the caller want to use XIncludes that involves relative URIs it can use an XIncludeTransformer on the stream before passing it to a VPC. Second, running an XIncludeTransformer on a input stream of a VPC means that all internal URIs in the VPC are exposed. But if the VPC writer finds that ok, I would assume that resolving them in the VPC context would be the most expected result.

I think it could be a good idea to make the XIncludeTransformer (and similar things) configurable so that one can require them to only resolve absolute URIs for VPC usage on input streams.

<snip/>

Polymorphic Blocks
==================

This far I (and before me Sylvain) have proposed that static binding is the best strategy for VPCs that are _used_ by some other sitemap or block. But as a sidenote it might be worth mentioning that we can make good use of dynamic bindning as well.

In Stefano's Cocoon Blocks document http://wiki.apache.org/cocoon/BlockIntroduction a mechanism for block inheritance is described where a block B can extend a block A. Say that block A makes /foo/bar available through its sitemap then block B can overide /foo/bar by defining it in its own sitemap. If B doesn't define it, fallback to the version of block A will be used.

We can push this further by allowing dynamic resolution of relative URIs. We could introduce a protocol "dynamic:" (or maybe "polymorphic:") that resolves URIs according to the dynamic strategy.

If block A use "dynamic:/foo/bar" whithin some of its pipelines the _extending_ block B will be able to override the default behaviour by providing its own version of /foo/bar. This is very usefull if you have a block that uses some default configuration documents and content files. Then you can step by step customize its behaviour by providing own versions of what you want to change.

We have used something like that through some simple "sitemap magic" in a couple of our applications for reusing common parts, with good results.

                        --- oOo ---

WDYT?

/Daniel

Re: [RT] Implementation of VPCs and "multi-relative" source resolving (long)

Reply via email to