Prasenjit Mukherjee wrote:
I think nutch has a distributed lucene implementation. I could have
used nutch straightaway, but I have a different crawler, and also dont
want to use NDFS(which is used by nutch) . What I have proposed
earlier is basically based on mapReduce paradigm, which is used by
nutch as well.
It would be nice to get some articles specifically detailing out the
distributed architecture used in nutch.
A few comments:
* you can use your own crawler, and then only write some glue code to
convert the output of that crawler to the format that Nutch uses.
* Nutch can be run in a so called "local" mode, without using NDFS
* the core map-reduce and I/O functionality has been split to its own
project, Hadoop, where the development is taking place at a furious rate
;-) This code is completely independent of Nutch or Lucene. You can
implement your own data processing using this framework.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]