[ 
https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587223#action_12587223
 ] 

Samee Zahur commented on MAHOUT-20:
-----------------------------------

some of the fuctions like add or distance seem to be iterating through each 
dimention in the point in a conventional loop: 
for(int i=0;i<z.cardinality();i++) ......
something like this. but in a high dimentional input, this seems to be 
cancelling out most of the advantages gained by the use of SparseVector. I mean 
we are not taking advantage of the sparseness of the input data and looping 
through all the elements in all cases. One possible alternative might be to add 
a sort of iterator mechanism in the Vector interface. That would only visit 
non-null elements. 

Samee

> Migrate Canopy and KMeans Implementations to Vectors
> ----------------------------------------------------
>
>                 Key: MAHOUT-20
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-20
>             Project: Mahout
>          Issue Type: Task
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Isabel Drost
>         Attachments: vectorClustering.txt
>
>
> Canopy and KMeans clustering implementations use Float[] representations 
> instead of the new Vector package. They need to be migrated and the Vector 
> package may need some enhancement to support the notion of payloads. This 
> would be a good project for somebody new to the project who wants to get 
> involved. If somebody wants to implement this, just assign the issue to 
> yourself and I will hold off doing it myself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to