Arthur B created NUTCH-2328:

             Summary: GeneratorJob does not generate anything on second run
                 Key: NUTCH-2328
             Project: Nutch
          Issue Type: Bug
          Components: generator
    Affects Versions: 2.3.1, 2.2.1, 2.3, 2.2, 2.4, 2.5
         Environment: Ubuntu 16.04 / Hadoop 2.7.1
            Reporter: Arthur B

Given a topN parameter (ie 10) the GeneratorJob will fail to generate anything 
new on the subsequent runs within the same process space.
To reproduce the issue submit the GeneratorJob twice one after another to the 
M/R framework. Second time will say it generated 0 URLs.
This issue is due to the usage of the static count field 
(org.apache.nutch.crawl.GeneratorReducer#count) to determine if the topN value 
has been reached.

This message was sent by Atlassian JIRA

Reply via email to