Re: Searching XML content using lucene

Stefano Mazzocchi Mon, 03 Dec 2001 03:47:46 -0800

Bernhard Huber wrote:
> 
> Hi,
> There was some mails regarding sematic searching, and using lucene (
> http://jakarta.apache.org/lucene )
> as an indexing engine some time ago.


Yep.

> For all who are interested in indexing & searching xml, some noted about
> the implementation which is just at the beginnig:
> 
> I have now implemented some avalon components for:
> 1) Crawling cocoon-view=content, cocoon-view=links
> 
> 2) Indexing xml documents, as a sample I took the /cocoon/documents URI
> space.

Wow, sounds very cool. How do you feel about sharing/donating that code?
I'd very interesting in working on that.

> The lucene documents have following fields:
> * url the url of the document
> * body the raw text of all elements of the document
> * More over each element, and each attribute of an element generated a
> field, too.
> Thus searching for "Introduction" searches the body field by default.
> Searching for "s1@title:Introduction" searches only for documents having
> an attribute title in s1 element matching Introduction.

Ok
 
> I have some question, maybe someone may help:
> * how can i avoid generating a full http-request, as the crawler sits
> inside of cocoon, and indexing
> an URI space of the current cocoon engine, there should be(?) some
> method accessing the
> sitemap, and forwarding it the crawling request, which will speed up the
> indexing step.

The Cocoon CLI does crawling internally without the overhead of HTTP
requests.

Follow the flow at Cocoon.main() to know how that is done.

Hope this helps.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Re: Searching XML content using lucene

Reply via email to