Hi Eddie,

* I've also re-created the lucene index plugin as part of our plugin, as we
don't use Solr, but our own search application.  *

One task you could be interested in is to make the indexing backends
pluggable. See https://issues.apache.org/jira/browse/NUTCH-1047 </> for
details. This would probably involve refactoring all the indexing related
code. Quite a bit of exploring to do but I think this would be both
interesting and useful.

Regarding the tutorial on distributed mode : it makes sense to run Nutch in
pseudo-distributed mode even if you have only one machine available. You
can then see the progress of your crawl using the Hadoop task tracker,
check the counters for the jobs etc...

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to