Robert Goene wrote:

I received an email from the other participant for the lenya-search project. He told me he would withdraw from the competition (if that is the correct term).

;)

I am working on a new version of the proposal to integrate the feedback Gregor gave me. There are a few blank spots left for me:

Could you please tell me what the current view is on the use of
Jackrabbit? It is not completely clear to me what the role of Jackrabbit is in Lenya. Is it only the sitemap and the workflow data or is it supposed to be the general storage mechanism for all the documents?

eventually, jackrabbit will probably replace most, if not all, uses of the file system in lenya. we'd use it to store content, the sitetree, metadata about a document, wf metadata, revisions, ac nodes.

obviously, this will be done in stages, but it makes sense to incorporate it into new designs.

The role of nutch and lucene is not clear to me in the former situation and the latter should imply a different approach to searching.

jackrabbit doesn't have full text search by itself, so lucene would be used to index the repository. nutch is used for crawling external sites, replacing the homegrown crawling code. for instance, the university of zurich crawled all their sites with the lenya crawler to be able to have unified search, no matter whether a site is managed by lenya or not.

If documents are stored in jackrabbit, a local filesystem, or some other xml-storage device and the documents are indexed when they are saved, what is the cooperation between Lucene and Jackrabbit? I don't see it yet. I can imagine a query capability in jackrabbit, but wouldn't this be a replacement of the searching facility?

jackrabbit does have some query capabilities, and lenya will make use of them. queries like: give me all documents last modified this week are ideally suited for jackrabbit.

lucene maintains a seperate index of all content in the repository, which could be implemented by using

http://incubator.apache.org/jackrabbit/apidocs/org/apache/jackrabbit/core/query/lucene/package-summary.html

as to how the details should work (whether the lenya api notifies lucence about changed documents, or jackrabbit has an observer that calls lucene), i dunno, that is up to you.

Is the migration to jackrabbit a current issue and if it is, could you
give me more information (a discussion thread would do) on the design
considerations made?

the repository work has been discussed on

http://wiki.apache.org/lenya/ProposalRepository

and more recently, an integration with the sitetree as a first step was commited to the sandbox. at the same time, the lenya api internals are slowly being rewritten to get rid of direct java.io.File calls, replacing them with avalon sources. this will allow to migrate further parts of lenya to jackrabbit, when the time comes. work on that has not started yet, though.

I have no idea what to investigate, let alone what to solve for the integration of jackrabbit and nutch.

nutch and jackrabbit would have no integration. the data from nutch would end up in a lucene index, just as the data from jackrabbit would.

Thanks a lot. I am looking forward to work on this project!

hope this helps

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to