[
https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464038#comment-13464038
]
Pat Ferrel commented on MAHOUT-1080:
------------------------------------
The inconsistant support for NamedVector seems an issue with Mahout in general.
If you don't use a NamedVector your clustered points will have no ids. In
another issue NamedVectors were just added to the output of SSVD. It would be
nice to have a more expressive version of a vector that goes through all the
analysis pipeline.
However I'd vote for the WeightedPropertyVectorWritable which seems a more
general solution and already exists. There are several, if not many, things
that would be nice to associate with a vector at some point in the analysis
pipeline (distance to centroid, name, some-external-key, pdf, etc.) Why not
adopt it as a standard for i/o of jobs that can support it? Then add properties
for each pipeline task that make sense. It would do away with the need for
several dictionaries methinks.
> Kmeans clustered output losses vectorId given in the input
> ----------------------------------------------------------
>
> Key: MAHOUT-1080
> URL: https://issues.apache.org/jira/browse/MAHOUT-1080
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.7
> Reporter: Smita Wadhwa
> Fix For: 0.8
>
> Attachments: kMeansClusterVectorId.diff
>
>
> The input to the Kmeans is Intwritable and vectorWritable
> and the output of clustered points is clusterId
> WeightedVectorWitable(vector,distance-from-the-centre)
> The information the id of the vector is lost in this processing .
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira