[ 
https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516675
 ] 

Emmanuel Joke commented on NUTCH-530:
-------------------------------------

Actually I don't re-use CrawlDbReducer, I've define a new class as Combiner. 
This class aggregates only the score of all CrawlDatum with the status "Linked" 
into one CrawlDatum. Its just a part of what CrawlDbReducer do. I've done few 
test in different case and it has no impact on the current score.

> Add a combiner to improve performance on updatedb
> -------------------------------------------------
>
>                 Key: NUTCH-530
>                 URL: https://issues.apache.org/jira/browse/NUTCH-530
>             Project: Nutch
>          Issue Type: Improvement
>         Environment: java 1.6
>            Reporter: Emmanuel Joke
>            Assignee: Emmanuel Joke
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-530.patch
>
>
> We have a lot of similar links with status "linked" generated at the ouput of 
> the map task when we try to update the crawldb based on the segment fetched.
> We can use a combiner to improve the performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to