Sebastian Nagel created NUTCH-2737:
--------------------------------------

             Summary: Generator: count and log reason of rejections during 
selection
                 Key: NUTCH-2737
                 URL: https://issues.apache.org/jira/browse/NUTCH-2737
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.16
            Reporter: Sebastian Nagel
             Fix For: 1.17


During the map phase of the selection step, the generator rejects many (usually 
most of) items for various reasons:
- not yet time for a refetch (returned by the fetch scheduler)
- generator score too low
- status does not match restrict status
- Jexl expression not matched

and some more. It would be useful if the reasons are counted and logged, esp. 
when the CrawlDb gets bigger and multiple options to restrict the selection are 
used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to