Arthur B created NUTCH-2328:
Summary: GeneratorJob does not generate anything on second run
Issue Type: Bug
Affects Versions: 2.3.1, 2.2.1, 2.3, 2.2, 2.4, 2.5
Environment: Ubuntu 16.04 / Hadoop 2.7.1
Reporter: Arthur B
Given a topN parameter (ie 10) the GeneratorJob will fail to generate anything
new on the subsequent runs within the same process space.
To reproduce the issue submit the GeneratorJob twice one after another to the
M/R framework. Second time will say it generated 0 URLs.
This issue is due to the usage of the static count field
(org.apache.nutch.crawl.GeneratorReducer#count) to determine if the topN value
has been reached.
This message was sent by Atlassian JIRA