[
https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463838#comment-13463838
]
Jeff Eastman commented on MAHOUT-1080:
--------------------------------------
Using the WritableComparable key from the vector input file as an identifier
certainly seems reasonable. We introduced NamedVectors a long time ago to allow
for identifiers to pass through the clustering classification phase and most
current Mahout applications take this approach. I'm not sure a new writable
needs to be introduced here. We could also modify the
ClusterClassificationMapper to emit a NamedVector with the key in it if the
VectorWritable was not already named.
> Kmeans clustered output losses vectorId given in the input
> ----------------------------------------------------------
>
> Key: MAHOUT-1080
> URL: https://issues.apache.org/jira/browse/MAHOUT-1080
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.7
> Reporter: Smita Wadhwa
> Fix For: 0.8
>
> Attachments: kMeansClusterVectorId.diff
>
>
> The input to the Kmeans is Intwritable and vectorWritable
> and the output of clustered points is clusterId
> WeightedVectorWitable(vector,distance-from-the-centre)
> The information the id of the vector is lost in this processing .
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira