Re: Google Summer of Code

Gregor J. Rothfuss Mon, 06 Jun 2005 14:18:02 -0700

Robert Goene wrote:

I received an email from the other participant for the lenya-searchproject. He told me he would withdraw from the competition (if that isthe correct term).

;)

I am working on a new version of the proposal to integrate the feedbackGregor gave me. There are a few blank spots left for me:
Could you please tell me what the current view is on the use of
Jackrabbit? It is not completely clear to me what the role of Jackrabbitis in Lenya. Is it only the sitemap and the workflow data or is itsupposed to be the general storage mechanism for all the documents?

eventually, jackrabbit will probably replace most, if not all, uses ofthe file system in lenya. we'd use it to store content, the sitetree,metadata about a document, wf metadata, revisions, ac nodes.

obviously, this will be done in stages, but it makes sense toincorporate it into new designs.

Therole of nutch and lucene is not clear to me in the former situation andthe latter should imply a different approach to searching.

jackrabbit doesn't have full text search by itself, so lucene would beused to index the repository. nutch is used for crawling external sites,replacing the homegrown crawling code. for instance, the university ofzurich crawled all their sites with the lenya crawler to be able to haveunified search, no matter whether a site is managed by lenya or not.

If documents are stored in jackrabbit, a local filesystem, or some otherxml-storage device and the documents are indexed when they are saved,what is the cooperation between Lucene and Jackrabbit? I don't see ityet. I can imagine a query capability in jackrabbit, but wouldn't thisbe a replacement of the searching facility?

jackrabbit does have some query capabilities, and lenya will make use ofthem. queries like: give me all documents last modified this week areideally suited for jackrabbit.

lucene maintains a seperate index of all content in the repository,which could be implemented by using


http://incubator.apache.org/jackrabbit/apidocs/org/apache/jackrabbit/core/query/lucene/package-summary.html

as to how the details should work (whether the lenya api notifieslucence about changed documents, or jackrabbit has an observer thatcalls lucene), i dunno, that is up to you.

Is the migration to jackrabbit a current issue and if it is, could you
give me more information (a discussion thread would do) on the design

considerations made?


the repository work has been discussed on

http://wiki.apache.org/lenya/ProposalRepository

and more recently, an integration with the sitetree as a first step wascommited to the sandbox. at the same time, the lenya api internals areslowly being rewritten to get rid of direct java.io.File calls,replacing them with avalon sources. this will allow to migrate furtherparts of lenya to jackrabbit, when the time comes. work on that has notstarted yet, though.

I have no idea what to investigate, let alone whatto solve for the integration of jackrabbit and nutch.

nutch and jackrabbit would have no integration. the data from nutchwould end up in a lucene index, just as the data from jackrabbit would.

Thanks a lot. I am looking forward to work on this project!


hope this helps

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Google Summer of Code

Reply via email to