Re: Distributed Lucene..

Andrzej Bialecki Tue, 07 Mar 2006 01:42:03 -0800

Prasenjit Mukherjee wrote:

I think nutch has a distributed lucene implementation. I could haveused nutch straightaway, but I have a different crawler, and also dontwant to use NDFS(which is used by nutch) . What I have proposedearlier is basically based on mapReduce paradigm, which is used bynutch as well.
It would be nice to get some articles specifically detailing out thedistributed architecture used in nutch.


A few comments:

* you can use your own crawler, and then only write some glue code toconvert the output of that crawler to the format that Nutch uses.


* Nutch can be run in a so called "local" mode, without using NDFS

* the core map-reduce and I/O functionality has been split to its ownproject, Hadoop, where the development is taking place at a furious rate;-) This code is completely independent of Nutch or Lucene. You canimplement your own data processing using this framework.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Distributed Lucene..

Reply via email to