Dave, This is great.
Do you know if the new streaming k-means has the same problem? On Tue, Nov 5, 2013 at 3:02 PM, Dave DeBarr (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Dave DeBarr updated MAHOUT-1351: > -------------------------------- > > Status: Patch Available (was: Open) > > This simple "svn diff" (patch) resolves issue MAHOUT-1351 > > > Adding DenseVector support to AbstractCluster > > --------------------------------------------- > > > > Key: MAHOUT-1351 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1351 > > Project: Mahout > > Issue Type: Improvement > > Components: Clustering > > Affects Versions: 0.8 > > Reporter: Dave DeBarr > > Priority: Minor > > Labels: performance > > Fix For: 0.9 > > > > Attachments: MAHOUT-1351.patch > > > > Original Estimate: 1h > > Remaining Estimate: 1h > > > > This improvement reduces runtime by 80% when performing k-means > clustering of Scale Invariant Feature Transform (SIFT) descriptors to > derive visual words for computer vision. Unlike sparse document vectors, > SIFT descriptors are dense. This improvement involves updating the > org.apache.mahout.clustering.AbstractCluster(Vector point, int id2) > constructor to use "point.clone()" instead of "new > RandomAccessSparseVector(point)" for creating the centroid. Also added > testKMeansSeqJobDenseVector() test for DenseVector processing. > > > > -- > This message was sent by Atlassian JIRA > (v6.1#6144) >
