Hi Eddie, * I've also re-created the lucene index plugin as part of our plugin, as we don't use Solr, but our own search application. *
One task you could be interested in is to make the indexing backends pluggable. See https://issues.apache.org/jira/browse/NUTCH-1047 </> for details. This would probably involve refactoring all the indexing related code. Quite a bit of exploring to do but I think this would be both interesting and useful. Regarding the tutorial on distributed mode : it makes sense to run Nutch in pseudo-distributed mode even if you have only one machine available. You can then see the progress of your crawl using the Hadoop task tracker, check the counters for the jobs etc... Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com