[
https://issues.apache.org/jira/browse/MAHOUT-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881989#comment-13881989
]
Suneel Marthi commented on MAHOUT-1410:
---------------------------------------
[~pferrel] addressing your questions below:-
Questions i'd ask in a review:
1) Are you changing the type of the vector in clusteredPoints to NamedVector or
was it always that way with a blank name?
We are changing the vector type to be NamedVector now (else I didn't see a
way of storing the vectorIds).
2) If the class is changing are we sure that doesn't mess things up when
actually using a named vector as clustering input?
Tested this with both NamedVectors and non-NamedVectors, and can confirm
that the NamedVectors work fine.
Try running examples/bin/cluster-reuters.sh (option 1 for kmeans), by
default the script uses NamedVectors.
3) Not sure how to test "sequential" clustering but if you tell me what you
mean I can test that too.
Add '-xm sequential' to the KMeans Driver call.
> clusteredPoints do not contain a vector id
> ------------------------------------------
>
> Key: MAHOUT-1410
> URL: https://issues.apache.org/jira/browse/MAHOUT-1410
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.8
> Environment: using 0.9 release candidate
> Reporter: Pat Ferrel
> Assignee: Suneel Marthi
> Fix For: 0.9
>
> Attachments: MAHOUT-1410.patch, MAHOUT-1410.patch
>
>
> When clustering non-named vectors there are no vector ids in clusteredPoints
> so the other values there, cluster id, vector values, distance-squared, pdf,
> cannot be tied to any known vector.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)