On Sun, 2 Dec 2001, Bernhard Huber wrote:

> Hi,
> There was some mails regarding sematic searching, and using lucene as an
> indexing engine some time ago.
> For all who are interested in indexing & searching xml, some noted about
> the implementation which
> is just at the beginnig:
>
> I have now implemented some avalon components for:
> Crawling cocoon-view=content, cocoon-view=links
> Now I'm generating for each document which should get generated a full
> HTTP-Request.
>
> Indexing xml documents, as a sample I took the /cocoon/documents URI space.
> The lucene documents have following fields:
> * url the url of the document
> * body the raw text of all elements of the document
> * More over each element, and each attribute of an element generated a
> field, too.
> Thus searching for "Introduction" searches the body field by default.
> Searching for "s1@title:Introduction" searches only for documents having
> an attribute title in s1 element matching Introduction.
>
> I have some question, maybe you can help:
> * how can i avoid generating a full http-request, as the crawler sits
> inside of cocoon, and indexing
>  an URI space of the current cocoon engine, there should be(?) some
> method accessing the
> sitemap, and forwarding it the crawling request, which will speed up the
> indexing step.

Look how the CLI environment does it (start at org.apache.cocoon.Main)

Giacomo


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to