[ 
https://issues.apache.org/jira/browse/MAHOUT-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-626:
--------------------------------

    Fix Version/s:     (was: 0.5)

> T1 and T2 Values in Canopy (& MeanShift) 
> -----------------------------------------
>
>                 Key: MAHOUT-626
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-626
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>         Attachments: CanopyT3T4.patch
>
>
> Users are reporting that the T1 and T2 threshold values which work in 
> sequential mode don't work as well in the mapreduce mode because both the 
> mapper and reducer are using the same values. The effect of coalescing a 
> number of points into a single centroid done by the mapper changes the 
> distances enough that independent threshold values are needed in the reducer. 
> Here is a patch which implements optional T3 and T4 threshold values which 
> are only used by the canopy reducer. Convenience methods have been added for 
> API compatibility and defaults included so that these values will default to 
> T1 and T2. A new unit test confirms the thresholds are being set correctly.
> If this works out as a positive improvement, I will make the same changes to 
> MeanShift and commit them

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to