Re: [RT] Fixing the CLI

Nicola Ken Barozzi Mon, 24 Feb 2003 09:57:11 -0800

Upayavira wrote, On 24/02/2003 18.24:

This will be possible soon. Functionality is there, command line
option is missing ATM. If you need it urgent, you can fix it in 10
minutes.
Yes, it's on my TODO list, along with other CLI optimizations I discussed with Vadim :-)
Seeing as this was my fault (I didn't add an option to Main.java when I split Main into Main and CocoonBean, patch applied by Vadim), I have done it now.

Not your fault. It's an additional feature :-D

There's now an option (-e), which allows you to switch off confirmation of extensions (-e false will switch it off, default is true to maintain existing functionality).

I've also added an option to pre-load a class, so that the CLI can be used to generate database driven sites (-L <classname>). It can be repeated to allow the loading of more than one class.

As soon as I can get my Cocoon system to work (see invalid config message), I'll test and post a patch to Bugzilla.

Excellent! :-D

I'd be interested to hear about these CLI optimizations you refer to.

Well... hmmm... ok, let me see if I remember it well enough.

There are two sets of optimizations possible: traversing optimizations and sitemap short-circuits.

Traversing optimizations
-------------------------

As you know, the Cocoon CLI gets the content of a page 3 times.
I had refactored these three calls to Cocoon in the methods (in call order):

 1 - getLinks
      First the page is generated and the link view is used to
      get the links

 2 - getType (called in translateURI)
      Then the type of the page is needed, so we know if we need
      to add an extension or other things; basically to translate
      the URI

 3 - getPage
     Actually gets the page *and* uses the translated URIS in the links

Now, with the -e option we basically don't need step 2. If done correctly, this will increase the speed! :-)

So we have two steps left: getting links and getting the page.
If we can make them into a single step we're done.

Cocoon has the concept of pluggable pipelines. And each pipeline is responsible of connecting the various components. If we used a pipeline that simply inserts between the source and the next components a pipe that records all links (org.apache.cocoon.xml.xlink.ExtendedXLinkPipe.java) into the Enviroment, we can effectively get both the result and the links in a single pass.

NOTE: This is possible *only* if we use the -e option. If we don't, the URL translation needed makes it impossible to do it in a single step, unless we keep the documents in memory and use a recursive algorithm, which poses bigger problems of scalability.


Sitemap short-circuits
-------------------------

Sometimes in the sitemap you will find things like:

    <map:match pattern="*/**">
      <map:read mime-type="text/html" src="docs/{1}/{2}.html"/>
    </map:match>

In this case the CLI fails to copy all the html files that the webapp version does.

We *could* pass it in the pipeline and traverse the links, but if we didn't want to touch the html at all? Imagine also that those html files are 5MB of Javadocs... ;-)

So in this case the CLI could see that we have a match with a reader on the local filesystem, and locally "invert" the pipeline with an optimization. That is, copy all html files under the docs dir, which in Java can be done orders of magnitude faster than under Cocoon.

--
Nicola Ken Barozzi                   [EMAIL PROTECTED]
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Fixing the CLI

Reply via email to