[
https://issues.apache.org/jira/browse/MAHOUT-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013980#comment-13013980
]
Sean Owen commented on MAHOUT-645:
----------------------------------
One patch named "MAHOUT-645.patch" and not compressed would be easiest to deal
with.
I just actually read the title... this is only adding a benchmark? Nothing
wrong with that per se but seems a lot more useful if this optimization could
be turned into a patch to improve the real code instead of just demonstrating
it's an optimization (which is most certainly true).
> Elkan distance optimization for VectorBenchmarks class
> ------------------------------------------------------
>
> Key: MAHOUT-645
> URL: https://issues.apache.org/jira/browse/MAHOUT-645
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.4
> Environment: Ubuntu Linux at Intel Core2 Duo P7450 @ 2.13GHz
> Reporter: Gustavo Salazar Torres
> Priority: Minor
> Labels: centroid, clustering, elkan
> Attachments: patches.zip
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Implementation of first lemma of Elkan's optimization:
> Given three points x, b, c (where b and c are centroids):
> d(b,c)>=2d(x.b) then d(x,c)>=d(x,b)
> in which case we wouldn't need to calculate d(x,c). This is used to find the
> closest centroid for every point x.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira