[ https://issues.apache.org/jira/browse/NUTCH-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516675 ]
Emmanuel Joke commented on NUTCH-530: ------------------------------------- Actually I don't re-use CrawlDbReducer, I've define a new class as Combiner. This class aggregates only the score of all CrawlDatum with the status "Linked" into one CrawlDatum. Its just a part of what CrawlDbReducer do. I've done few test in different case and it has no impact on the current score. > Add a combiner to improve performance on updatedb > ------------------------------------------------- > > Key: NUTCH-530 > URL: https://issues.apache.org/jira/browse/NUTCH-530 > Project: Nutch > Issue Type: Improvement > Environment: java 1.6 > Reporter: Emmanuel Joke > Assignee: Emmanuel Joke > Fix For: 1.0.0 > > Attachments: NUTCH-530.patch > > > We have a lot of similar links with status "linked" generated at the ouput of > the map task when we try to update the crawldb based on the segment fetched. > We can use a combiner to improve the performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers