[ https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587223#action_12587223 ]
Samee Zahur commented on MAHOUT-20: ----------------------------------- some of the fuctions like add or distance seem to be iterating through each dimention in the point in a conventional loop: for(int i=0;i<z.cardinality();i++) ...... something like this. but in a high dimentional input, this seems to be cancelling out most of the advantages gained by the use of SparseVector. I mean we are not taking advantage of the sparseness of the input data and looping through all the elements in all cases. One possible alternative might be to add a sort of iterator mechanism in the Vector interface. That would only visit non-null elements. Samee > Migrate Canopy and KMeans Implementations to Vectors > ---------------------------------------------------- > > Key: MAHOUT-20 > URL: https://issues.apache.org/jira/browse/MAHOUT-20 > Project: Mahout > Issue Type: Task > Components: Clustering > Affects Versions: 0.1 > Reporter: Jeff Eastman > Assignee: Isabel Drost > Attachments: vectorClustering.txt > > > Canopy and KMeans clustering implementations use Float[] representations > instead of the new Vector package. They need to be migrated and the Vector > package may need some enhancement to support the notion of payloads. This > would be a good project for somebody new to the project who wants to get > involved. If somebody wants to implement this, just assign the issue to > yourself and I will hold off doing it myself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.