[ 
https://issues.apache.org/jira/browse/MAHOUT-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881989#comment-13881989
 ] 

Suneel Marthi commented on MAHOUT-1410:
---------------------------------------

[~pferrel]  addressing your questions below:-

Questions i'd ask in a review:
1) Are you changing the type of the vector in clusteredPoints to NamedVector or 
was it always that way with a blank name?
    
    We are changing the vector type to be NamedVector now (else I didn't see a 
way of storing the vectorIds). 
    
2) If the class is changing are we sure that doesn't mess things up when 
actually using a named vector as clustering input?

     Tested this with both NamedVectors and non-NamedVectors, and can confirm 
that the NamedVectors work fine. 
      Try running examples/bin/cluster-reuters.sh (option 1 for kmeans), by 
default the script uses NamedVectors.

3) Not sure how to test "sequential" clustering but if you tell me what you 
mean I can test that too.

      Add '-xm sequential' to the KMeans Driver call.

> clusteredPoints do not contain a vector id
> ------------------------------------------
>
>                 Key: MAHOUT-1410
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1410
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.8
>         Environment: using 0.9 release candidate
>            Reporter: Pat Ferrel
>            Assignee: Suneel Marthi
>             Fix For: 0.9
>
>         Attachments: MAHOUT-1410.patch, MAHOUT-1410.patch
>
>
> When clustering non-named vectors there are no vector ids in clusteredPoints 
> so the other values there, cluster id, vector values, distance-squared, pdf, 
> cannot be tied to any known vector.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to