[ 
https://issues.apache.org/jira/browse/NUTCH-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503207#comment-16503207
 ] 

ASF GitHub Bot commented on NUTCH-2574:
---------------------------------------

sebastian-nagel opened a new pull request #344: NUTCH-2574 Generator: hostCount 
>= maxCount comparison wrong
URL: https://github.com/apache/nutch/pull/344
 
 
   - ensure that also last created segment contains maxCount URLs per host
   - use local variable to hold host-specific maxCount set in HostDb  (do not 
modify instance variable temporarily)
   - fix Java compile warnings: add missing generic type parameters
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Generator: hostCount >= maxCount comparison wrong
> -------------------------------------------------
>
>                 Key: NUTCH-2574
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2574
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.13
>            Reporter: Michael Coffey
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>
> In the Generator.Selector.reduce function, there is a comparison of 
> hostCount[1] to maxCount, to determine whether or not to push the current URL 
> to the next segment. The purpose is to honor generate.max.count.
> Sebastian noticed that it should test if (hostCount[1] > maxCount) rather 
> than ">=".  As it stands, the code sometimes puts one less url into a segment 
> than it should.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to