Bernhard Huber wrote: > > Hi, > There was some mails regarding sematic searching, and using lucene ( > http://jakarta.apache.org/lucene ) > as an indexing engine some time ago.
Yep. > For all who are interested in indexing & searching xml, some noted about > the implementation which is just at the beginnig: > > I have now implemented some avalon components for: > 1) Crawling cocoon-view=content, cocoon-view=links > > 2) Indexing xml documents, as a sample I took the /cocoon/documents URI > space. Wow, sounds very cool. How do you feel about sharing/donating that code? I'd very interesting in working on that. > The lucene documents have following fields: > * url the url of the document > * body the raw text of all elements of the document > * More over each element, and each attribute of an element generated a > field, too. > Thus searching for "Introduction" searches the body field by default. > Searching for "s1@title:Introduction" searches only for documents having > an attribute title in s1 element matching Introduction. Ok > I have some question, maybe someone may help: > * how can i avoid generating a full http-request, as the crawler sits > inside of cocoon, and indexing > an URI space of the current cocoon engine, there should be(?) some > method accessing the > sitemap, and forwarding it the crawling request, which will speed up the > indexing step. The Cocoon CLI does crawling internally without the overhead of HTTP requests. Follow the flow at Cocoon.main() to know how that is done. Hope this helps. -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]