[ 
https://issues.apache.org/jira/browse/NUTCH-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305322#comment-16305322
 ] 

ASF GitHub Bot commented on NUTCH-2455:
---------------------------------------

okedoki commented on issue #254: fix for NUTCH-2455 more efficient usage of 
hostdb in generate
URL: https://github.com/apache/nutch/pull/254#issuecomment-354269379
 
 
   I found a bug with partitioned that prevents to get correct hostdb data to 
the correct reducer. It is fixed.
   The second, I have applied the Eclipse auto-formatting as suggested by 
@lewismc . 
   
   For some reasons, I have a conflict with Generator from master. I assume it 
happened because of autoformating, so instead of correct comparison it shows 
that the whole code of Generator is replaced. 
   
   What is the rule for fixing in this case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Speed up the merging of HostDb entries for variable fetch delay
> ---------------------------------------------------------------
>
>                 Key: NUTCH-2455
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2455
>             Project: Nutch
>          Issue Type: Improvement
>          Components: generator
>    Affects Versions: 1.13
>            Reporter: Markus Jelsma
>         Attachments: NUTCH-2455.patch
>
>
> Citing Sebastian at NUTCH-2420:
> ??The correct solution would be to use <host,score> pairs as keys in the 
> Selector job, with a partitioner and secondary sorting so that all keys with 
> same host end up in the same call of the reducer. If values can also hold a 
> HostDb entry and the sort comparator guarantees that the HostDb entry 
> (entries if partitioned by domain or IP) comes in front of all CrawlDb 
> entries. But that would be a substantial improvement...??



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to