Log when db.max configuration limits reached
--------------------------------------------
Key: NUTCH-182
URL: http://issues.apache.org/jira/browse/NUTCH-182
Project: Nutch
Type: Improvement
Components: fetcher
Versions: 0.8-dev
Reporter: Matt Kangas
Priority: Trivial
Followup to http://www.nabble.com/Re%3A-Can%27t-index-some-pages-p2480833.html
There are three "db.max" parameters currently in nutch-default.xml:
* db.max.outlinks.per.page
* db.max.anchor.length
* db.max.inlinks
Having values that are too low can result in a site being under-crawled.
However, currently there is nothing written to the log when these limits are
hit, so users have to guess when they need to raise these values.
I suggest that we add three new log messages at the appropriate points:
* "Exceeded db.max.outlinks.per.page for URL "
* "Exceeded db.max.anchor.length for URL "
* "Exceeded db.max.inlinks for URL "
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira