[ https://issues.apache.org/jira/browse/NUTCH-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel reassigned NUTCH-2737: -------------------------------------- Assignee: Sebastian Nagel > Generator: count and log reason of rejections during selection > -------------------------------------------------------------- > > Key: NUTCH-2737 > URL: https://issues.apache.org/jira/browse/NUTCH-2737 > Project: Nutch > Issue Type: Improvement > Components: generator > Affects Versions: 1.15 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Minor > Fix For: 1.17 > > > During the map phase of the selection step, the generator rejects many > (usually most of) items for various reasons: > - not yet time for a refetch (returned by the fetch scheduler) > - generator score too low > - status does not match restrict status > - Jexl expression not matched > and some more. It would be useful if the reasons are counted and logged, esp. > when the CrawlDb gets bigger and multiple options to restrict the selection > are used. -- This message was sent by Atlassian Jira (v8.3.4#803005)