Prasenjit Mukherjee wrote:
I think nutch has a distributed lucene implementation. I could have used nutch straightaway, but I have a different crawler, and also dont want to use NDFS(which is used by nutch) . What I have proposed earlier is basically based on mapReduce paradigm, which is used by nutch as well.

It would be nice to get some articles specifically detailing out the distributed architecture used in nutch.


A few comments:

* you can use your own crawler, and then only write some glue code to convert the output of that crawler to the format that Nutch uses.

* Nutch can be run in a so called "local" mode, without using NDFS

* the core map-reduce and I/O functionality has been split to its own project, Hadoop, where the development is taking place at a furious rate ;-) This code is completely independent of Nutch or Lucene. You can implement your own data processing using this framework.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to