On Mon, 4 Aug 2003 22:38:55 +1000, "Jeff Turner" <[EMAIL PROTECTED]> said:
> On Mon, Aug 04, 2003 at 08:25:01AM +0000, Upayavira wrote:
> > On Sat, 2 Aug 2003 22:08:21 +1000, "Jeff Turner" <[EMAIL PROTECTED]> said:
> > > Hi,
> > >
> > > I'm tinkering around with the CLI, thinking how to add
> > > don't-crawl-this-page support, and have some questions on how cli.xconf
> > > currently works. The following block in cli.xconf has me confused..
> >
> > Jeff. Great to see you're engaging with it!
>
> It doubled Forrest's speed - I love it ;)
Great. And there's more we can do.
> > I have also been working on the CLI. I've spent my week's spare time
> > completely reworking it. I'll post separately about what I've been up to,
> > but basically the whole thing should be much easier to understand, with a
> > separate crawler class, a separate class for handling Cocoon
> > initialisation, and another for handling URI arithmetic (which you're
> > talking about below). As to adding exclusions, I think it should merely
> > be a question of identifying the syntax. The rest, with my new code,
> > should be pretty easy (e.g. tell the crawler what to ignore with a set of
> > wildcard parameters).
>
> Sounds marvellous.
I've started debugging now. I'll aim to commit later this week.
<snip/>
> > When I've got this going, I'm going to convert the xconf code to use a
> > Configuration object, and then write an Ant task to do the same
> > ProcessXConf, so that you can have the xconf code directly in your Ant
> > script. This Ant task will be a simple wrapper around the bean, and
> > should be pretty trivial.
>
> Mmm.. nice. Might be some ideas to steal from Ant here, notably the idea
> of PatternSets and Mappers.
Yup. I'm keen to see what we can steal. Unfortunately, we'll have to code
it twice - it doesn't seem to be possible to share code between ant and
cocoon.
> > I have also, I think, just sorted my problem with my caching code not
> > working. Basically, the Cocoon cache is transient. So therefore it is
> > lost every time Cocoon starts. And Cocoon is started every time the CLI
> > starts. So if we want to have the CLI only generate new pages based upon
> > the cache, we've got to make the cache for the CLI persistent. Again, see
> > separate thread.
>
> This would be really awesome :) Lots of people have asked if Forrest
> could only regenerate pages that have changed. I'll defer further
> thoughts till the other thread.
Thread will come when I've got the basic code working.
> ...
> > > Come to think of it, the attribute name 'src'
> > > doesn't really make sense. What is the "source" of a Cocoon URI? It
> > > would be the XML (documents/index.xml), which is not what we're
> > > specifying in @src.
> >
> > It is the source for a source/destination pair. You could see it as a
> > cocoon: protocol source (almost). Would you suggest something different?
>
> No, makes sense given that explanation.
Great.
> > > I have the feeling that cli.xconf's job, mapping URIs to the filesystem,
> > > could potentially be quite intricate. It is roughly an inverse of what
> > > the sitemap does. Perhaps we need an analogous syntax?
> >
> > Perhaps. I think we've only just started trying to work out what is
> > possible here. I'd be pleased to carry on the conversation, as what we
> > have at the moment is purely what I thought best, and not the result of
> > much community discussion.
> >
> > There's alot we could discuss here. For example, how do we handle the
> > situation where we want to crawl a number of pages, but don't want to
> > have to repeat the destination for each of them? I think we could come up
> > with an elegant configuration for this. My <uri> thing is only the
> > beginning.
>
> There is ${variable} interpolation code in Avalon, if that helps. Eg.
> ${context-root} in logkit.xconf.
I'll look into that.
> > The first thing to do is to start identifying the possible use cases for
> > URI mappings, so that we can see the range of the problem we're trying to
> > solve (and take it beyond the scope of just fixing my problems only!).
>
> Well, two observations:
>
> 1) Hosting a live Cocoon site is a PITA:
>
> - One has to fight with sysadmins to install JVMs. Many site hosts
> (like SF) don't even offer Java-based services.
> - JVMs permanently chew up vast amounts of memory
> - Servlet containers hang, crash, throw OutOfMemoryExceptions and are
> generally unreliable.
> - Cocoon is not particularly fast
>
> 2) A surprising number of sites **don't need to be dynamic**
>
> So in walks our hero, the CLI. We can get most of the magic of Cocoon,
> with none of the pain. Develop a site with a live Cocoon, and when
> you're ready to deploy, serialize it to disk and serve through Apache.
>
> That's why I think the CLI is very important. More than *anything* else,
> it has the potential to vastly widen Cocoon's audience.
>
> So from this perspective, the need is simple. We need the CLI to provide
> as accurate a representation of the live site as possible. Generally
> this means simply mirroring the URI structure to disk.
> Currently, the biggest unmet need is the ability to exclude certain URLs.
> There is usually non-Cocoon-generated content like Javadocs, or other
> parts of the site, which needs to be excluded.
Well, lets get that working well.
Are you willing to test my new version when its ready?
Regards, Upayavira