[ 
https://issues.apache.org/jira/browse/NUTCH-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273335#comment-13273335
 ] 

Ferdy Galema commented on NUTCH-1366:
-------------------------------------

The cool part about Nutchgora is that inlinks are already populated for the row 
that is inputted into the indexer. The DbUpdateReducer does this outlink 
inverting as part of the updating the db.

Btw it's very simple to reinstate the reducer, if we need to have one again.
                
> speed up indexing by eliminating the indexreducer
> -------------------------------------------------
>
>                 Key: NUTCH-1366
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1366
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>            Reporter: Ferdy Galema
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1366.patch
>
>
> Currently the indexer in Nutchgora consists of both mappers and reduces. But 
> the reduce code does not actually iterate over any (grouped/sorted) values. 
> It simply indexes individual key/value (String/Webpage) pairs. Therefore by 
> moving this indexing code to the mapper we can eliminate the reduce step 
> therefore making the indexing job much faster. (No more unnecessary spilling 
> to disk/network and no cpu wasted to sorting).
> Note this is not (directly) applicable to trunk because trunk uses a quite 
> different approach. Different types of input are combined to a single value 
> in the reducer. Although I think it is possible to implement a similar 
> optimization I am not sure how to do this. So if anyone wants this for trunk 
> too feel free to implement a similar patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to