[ 
https://issues.apache.org/jira/browse/NUTCH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma closed NUTCH-182.
-------------------------------

    Resolution: Won't Fix

Bulk close of legacy issues:
http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open_legacy_issues_in_jira

> Log when db.max configuration limits reached
> --------------------------------------------
>
>                 Key: NUTCH-182
>                 URL: https://issues.apache.org/jira/browse/NUTCH-182
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Matt Kangas
>            Priority: Trivial
>         Attachments: LinkDb.java.patch, ParseData.java.patch
>
>
> Followup to http://www.nabble.com/Re%3A-Can%27t-index-some-pages-p2480833.html
> There are three "db.max" parameters currently in nutch-default.xml:
>  * db.max.outlinks.per.page
>  * db.max.anchor.length
>  * db.max.inlinks
> Having values that are too low can result in a site being under-crawled. 
> However, currently there is nothing written to the log when these limits are 
> hit, so users have to guess when they need to raise these values.
> I suggest that we add three new log messages at the appropriate points:
>  * "Exceeded db.max.outlinks.per.page for URL "
>  * "Exceeded db.max.anchor.length for URL "
>  * "Exceeded db.max.inlinks for URL "

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to