Re: [jira] Commented: (MAHOUT-297) Canopy and Kmeans clustering slows down on using SeqAccVector for center

Jeff Eastman Tue, 27 Apr 2010 08:12:38 -0700

I'm not arguing it is a performance improvement for sparse vectors, justthat changing the class of the vector should not be necessary: if thevectors being clustered are dense then the cluster constructors shouldleave them dense. If the vectors that are being clustered are of asparse variety, then the constructors would use the same flavor for theclusters. I know I missed the previous discussion, but this change isviolating the contract of the API of the constructors and it makesdebugging new test cases that use dense vector a PITA.


I'm still opposed to it.


On 4/27/10 12:13 AM, Robin Anil (JIRA) wrote:

     [ 
https://issues.apache.org/jira/browse/MAHOUT-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861272#action_12861272
 ]

Robin Anil commented on MAHOUT-297:
-----------------------------------

There was a discussion about this on the dev list. Check the util Vector 
Benchmarks and see how much faster clustering became after this change.  
Shouldnt necessarily be SeqAcc, if the points are all dense vectors. But 
obvious savings for sparse data is much better than the slight loss in 
performance for dense. (you will see that in the vector benchmarks code)

Canopy and Kmeans clustering slows down on using SeqAccVector for center
------------------------------------------------------------------------

                 Key: MAHOUT-297
                 URL: https://issues.apache.org/jira/browse/MAHOUT-297
             Project: Mahout
          Issue Type: Improvement
          Components: Clustering
    Affects Versions: 0.4
            Reporter: Robin Anil
            Assignee: Robin Anil
             Fix For: 0.4

         Attachments: MAHOUT-297.patch, MAHOUT-297.patch, MAHOUT-297.patch, 
MAHOUT-297.patch, MAHOUT-297.patch

Re: [jira] Commented: (MAHOUT-297) Canopy and Kmeans clustering slows down on using SeqAccVector for center

Reply via email to