[
https://issues.apache.org/jira/browse/NUTCH-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402021#comment-16402021
]
ASF GitHub Bot commented on NUTCH-2536:
---------------------------------------
benmvachon opened a new pull request #298: NUTCH-2536 change
GeneratorReducer.count field to non-static variable…
URL: https://github.com/apache/nutch/pull/298
… for easier SDK experience
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> GeneratorReducer.count is a static variable
> -------------------------------------------
>
> Key: NUTCH-2536
> URL: https://issues.apache.org/jira/browse/NUTCH-2536
> Project: Nutch
> Issue Type: Bug
> Components: generator
> Affects Versions: 2.3.1
> Reporter: Ben Vachon
> Priority: Minor
> Labels: Generate
> Fix For: 2.4
>
> Original Estimate: 2.4h
> Remaining Estimate: 2.4h
>
> The count field of the GeneratorReducer class is a static field. This means
> that if the GeneratorJob is run multiple times within the same JVM, it will
> count all the webpages generated across all batches.
> The count field is checked against the GeneratorJob's topN configuration
> variable, which is described as:
> "top threshold for maximum number of URLs permitted in a batch"
> I understand this to mean that EACH batch should be capped at the topN value,
> not ALL batches.
> This isn't a problem with the way that Nutch is typically used because the
> script starts a new JVM each time. I'm not using the script, I'm calling the
> java classes directly (using the ToolRunner) within an existing JVM, so I'm
> categorizing this as an SDK issue.
> Changing the field to be non-static will not affect the behavior of the class
> as its run by the script.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)