The inline documentation of 'conf/crawl-tool.xml' mentions: <!-- Do not modify this file directly. Instead, copy entries that you --> <!-- wish to modify from this file into nutch-site.xml and change them --> <!-- there. If nutch-site.xml does not already exist, create it. -->
However, I don't see any way of overriding the properties defined in 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the configuration before 'conf/crawl-tool.xml' in the code. Here are the relevant code snippets: src/org/apache/nutch/crawl/Crawl.java (Lines 57 to 59) : Configuration conf = NutchConfiguration.create(); conf.addResource("crawl-tool.xml"); JobConf job = new NutchJob(conf); src/org/apache/nutch/tool/NutchConfiguration.java (Lines 39 to 40) : conf.addResource("nutch-default.xml"); conf.addResource("nutch-site.xml"); So, shouldn't that XML comment be removed from 'conf/crawl-tool.xml' ? Regards, Susam Pal