[ 
https://issues.apache.org/jira/browse/MAHOUT-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013980#comment-13013980
 ] 

Sean Owen commented on MAHOUT-645:
----------------------------------

One patch named "MAHOUT-645.patch" and not compressed would be easiest to deal 
with.
I just actually read the title... this is only adding a benchmark? Nothing 
wrong with that per se but seems a lot more useful if this optimization could 
be turned into a patch to improve the real code instead of just demonstrating 
it's an optimization (which is most certainly true).

> Elkan distance optimization for VectorBenchmarks class
> ------------------------------------------------------
>
>                 Key: MAHOUT-645
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-645
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>         Environment: Ubuntu Linux at Intel Core2 Duo P7450 @ 2.13GHz
>            Reporter: Gustavo Salazar Torres
>            Priority: Minor
>              Labels: centroid, clustering, elkan
>         Attachments: patches.zip
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Implementation of first lemma of Elkan's optimization:
> Given three points x, b, c (where b and c are centroids):
>                                            d(b,c)>=2d(x.b) then d(x,c)>=d(x,b)
> in which case we wouldn't need to calculate d(x,c). This is used to find the 
> closest centroid for every point x.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to