On Sun, 2 Dec 2001, Bernhard Huber wrote: > Hi, > There was some mails regarding sematic searching, and using lucene as an > indexing engine some time ago. > For all who are interested in indexing & searching xml, some noted about > the implementation which > is just at the beginnig: > > I have now implemented some avalon components for: > Crawling cocoon-view=content, cocoon-view=links > Now I'm generating for each document which should get generated a full > HTTP-Request. > > Indexing xml documents, as a sample I took the /cocoon/documents URI space. > The lucene documents have following fields: > * url the url of the document > * body the raw text of all elements of the document > * More over each element, and each attribute of an element generated a > field, too. > Thus searching for "Introduction" searches the body field by default. > Searching for "s1@title:Introduction" searches only for documents having > an attribute title in s1 element matching Introduction. > > I have some question, maybe you can help: > * how can i avoid generating a full http-request, as the crawler sits > inside of cocoon, and indexing > an URI space of the current cocoon engine, there should be(?) some > method accessing the > sitemap, and forwarding it the crawling request, which will speed up the > indexing step.
Look how the CLI environment does it (start at org.apache.cocoon.Main) Giacomo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]