Re: cli.xconf questions

Upayavira Mon, 04 Aug 2003 13:04:24 -0700

On Mon, 4 Aug 2003 22:38:55 +1000, "Jeff Turner" <[EMAIL PROTECTED]> said:
> On Mon, Aug 04, 2003 at 08:25:01AM +0000, Upayavira wrote:
> > On Sat, 2 Aug 2003 22:08:21 +1000, "Jeff Turner" <[EMAIL PROTECTED]> said:
> > > Hi,
> > > 
> > > I'm tinkering around with the CLI, thinking how to add
> > > don't-crawl-this-page support, and have some questions on how cli.xconf
> > > currently works.  The following block in cli.xconf has me confused..
> > 
> > Jeff. Great to see you're engaging with it!
> 
> It doubled Forrest's speed - I love it ;)


Great. And there's more we can do.

> > I have also been working on the CLI. I've spent my week's spare time
> > completely reworking it. I'll post separately about what I've been up to,
> > but basically the whole thing should be much easier to understand, with a
> > separate crawler class, a separate class for handling Cocoon
> > initialisation, and another for handling URI arithmetic (which you're
> > talking about below). As to adding exclusions, I think it should merely
> > be a question of identifying the syntax. The rest, with my new code,
> > should be pretty easy (e.g. tell the crawler what to ignore with a set of
> > wildcard parameters).
> 
> Sounds marvellous.

I've started debugging now. I'll aim to commit later this week.
 
<snip/> 

> > When I've got this going, I'm going to convert the xconf code to use a
> > Configuration object, and then write an Ant task to do the same
> > ProcessXConf, so that you can have the xconf code directly in your Ant
> > script. This Ant task will be a simple wrapper around the bean, and
> > should be pretty trivial.
> 
> Mmm.. nice.  Might be some ideas to steal from Ant here, notably the idea
> of PatternSets and Mappers.

Yup. I'm keen to see what we can steal. Unfortunately, we'll have to code
it twice - it doesn't seem to be possible to share code between ant and
cocoon.

> > I have also, I think, just sorted my problem with my caching code not
> > working. Basically, the Cocoon cache is transient. So therefore it is
> > lost every time Cocoon starts. And Cocoon is started every time the CLI
> > starts. So if we want to have the CLI only generate new pages based upon
> > the cache, we've got to make the cache for the CLI persistent. Again, see
> > separate thread.
> 
> This would be really awesome :)  Lots of people have asked if Forrest
> could only regenerate pages that have changed.  I'll defer further
> thoughts till the other thread.

Thread will come when I've got the basic code working.
 
> ...
> > > Come to think of it, the attribute name 'src'
> > > doesn't really make sense.  What is the "source" of a Cocoon URI?  It
> > > would be the XML (documents/index.xml), which is not what we're
> > > specifying in @src.
> > 
> > It is the source for a source/destination pair. You could see it as a
> > cocoon: protocol source (almost). Would you suggest something different?
> 
> No, makes sense given that explanation.

Great.

> > > I have the feeling that cli.xconf's job, mapping URIs to the filesystem,
> > > could potentially be quite intricate.  It is roughly an inverse of what
> > > the sitemap does.  Perhaps we need an analogous syntax?
> > 
> > Perhaps. I think we've only just started trying to work out what is
> > possible here. I'd be pleased to carry on the conversation, as what we
> > have at the moment is purely what I thought best, and not the result of
> > much community discussion.
> >
> > There's alot we could discuss here. For example, how do we handle the
> > situation where we want to crawl a number of pages, but don't want to
> > have to repeat the destination for each of them? I think we could come up
> > with an elegant configuration for this. My <uri> thing is only the
> > beginning. 
> 
> There is ${variable} interpolation code in Avalon, if that helps.  Eg.
> ${context-root} in logkit.xconf.

I'll look into that.
 
> > The first thing to do is to start identifying the possible use cases for
> > URI mappings, so that we can see the range of the problem we're trying to
> > solve (and take it beyond the scope of just fixing my problems only!).
> 
> Well, two observations:
> 
> 1) Hosting a live Cocoon site is a PITA:
> 
>  - One has to fight with sysadmins to install JVMs.  Many site hosts
>    (like SF) don't even offer Java-based services.
>  - JVMs permanently chew up vast amounts of memory
>  - Servlet containers hang, crash, throw OutOfMemoryExceptions and are
>    generally unreliable.
>  - Cocoon is not particularly fast
> 
> 2) A surprising number of sites **don't need to be dynamic**
> 
> So in walks our hero, the CLI.  We can get most of the magic of Cocoon,
> with none of the pain.  Develop a site with a live Cocoon, and when
> you're ready to deploy, serialize it to disk and serve through Apache.
> 
> That's why I think the CLI is very important.  More than *anything* else,
> it has the potential to vastly widen Cocoon's audience.
> 
> So from this perspective, the need is simple.  We need the CLI to provide
> as accurate a representation of the live site as possible.  Generally
> this means simply mirroring the URI structure to disk.
 
> Currently, the biggest unmet need is the ability to exclude certain URLs.
> There is usually non-Cocoon-generated content like Javadocs, or other
> parts of the site, which needs to be excluded.

Well, lets get that working well.

Are you willing to test my new version when its ready?

Regards, Upayavira

Re: cli.xconf questions

Reply via email to