On Tue, Apr 7, 2009 at 1:07 AM, Susam Pal <susam....@gmail.com> wrote: > The inline documentation of 'conf/crawl-tool.xml' mentions: > > <!-- Do not modify this file directly. Instead, copy entries that you --> > <!-- wish to modify from this file into nutch-site.xml and change them --> > <!-- there. If nutch-site.xml does not already exist, create it. --> > > However, I don't see any way of overriding the properties defined in > 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the > configuration before 'conf/crawl-tool.xml' in the code. Here are the > relevant code snippets: > > src/org/apache/nutch/crawl/Crawl.java (Lines 57 to 59) : > > Configuration conf = NutchConfiguration.create(); > conf.addResource("crawl-tool.xml"); > JobConf job = new NutchJob(conf); > > src/org/apache/nutch/tool/NutchConfiguration.java (Lines 39 to 40) : > > conf.addResource("nutch-default.xml"); > conf.addResource("nutch-site.xml"); > > So, shouldn't that XML comment be removed from 'conf/crawl-tool.xml' ? > > Regards, > Susam Pal >
I have uploaded a patch for this in : https://issues.apache.org/jira/browse/NUTCH-735 Instead of changing the XML comments, I have changed the code such that it behaves as per what the XML comments mention. Regards, Susam Pal