[ 
https://issues.apache.org/jira/browse/MAHOUT-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-749:
--------------------------------

    Attachment: MAHOUT-749.patch

This patch implements changes to the driver and mapper to utilize multiple 
reducers. The driver is modified to decrease the number of reducers in each 
iteration, finally to 1. The mapper is changed to send each of its outputs to a 
different reducer, depending upon the number deployed in the iteration. The 
unit tests are modified and run. This is ready for some experimentation with 
larger datasets and multiple reducers specified by -Dmapred.reduce.tasks.

> MeanShift Cannot Use Multiple Reducers
> --------------------------------------
>
>                 Key: MAHOUT-749
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-749
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>         Attachments: MAHOUT-749.patch
>
>
> The MeanShiftCanopy clustering job sets the numReducers=1 and this severely 
> limits its scalability for larger jobs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to