[ 
https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464038#comment-13464038
 ] 

Pat Ferrel commented on MAHOUT-1080:
------------------------------------

The inconsistant support for NamedVector seems an issue with Mahout in general. 
If you don't use a NamedVector your clustered points will have no ids. In 
another issue NamedVectors were just added to the output of SSVD. It would be 
nice to have a more expressive version of a vector that goes through all the 
analysis pipeline.

However I'd vote for the WeightedPropertyVectorWritable which seems a more 
general solution and already exists. There are several, if not many, things 
that would be nice to associate with a vector at some point in the analysis 
pipeline (distance to centroid, name, some-external-key, pdf, etc.) Why not 
adopt it as a standard for i/o of jobs that can support it? Then add properties 
for each pipeline task that make sense. It would do away with the need for 
several dictionaries methinks.
                
> Kmeans clustered output losses vectorId given in the input
> ----------------------------------------------------------
>
>                 Key: MAHOUT-1080
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1080
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Smita Wadhwa
>             Fix For: 0.8
>
>         Attachments: kMeansClusterVectorId.diff
>
>
> The input to the Kmeans is Intwritable and vectorWritable 
> and the output of clustered points is clusterId 
> WeightedVectorWitable(vector,distance-from-the-centre)
> The information the id of the vector is lost in this processing . 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to