The inline documentation of 'conf/crawl-tool.xml' mentions:

<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into nutch-site.xml and change them -->
<!-- there.  If nutch-site.xml does not already exist, create it.      -->

However, I don't see any way of overriding the properties defined in
'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the
configuration before 'conf/crawl-tool.xml' in the code. Here are the
relevant code snippets:

src/org/apache/nutch/crawl/Crawl.java (Lines 57 to 59) :

    Configuration conf = NutchConfiguration.create();
    conf.addResource("crawl-tool.xml");
    JobConf job = new NutchJob(conf);

src/org/apache/nutch/tool/NutchConfiguration.java  (Lines 39 to 40) :

    conf.addResource("nutch-default.xml");
    conf.addResource("nutch-site.xml");

So, shouldn't that XML comment be removed from 'conf/crawl-tool.xml' ?

Regards,
Susam Pal

Reply via email to