[jira] [Commented] (MAHOUT-1080) Kmeans clustered output losses vectorId given in the input

Jeff Eastman (JIRA) Wed, 26 Sep 2012 07:05:10 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463838#comment-13463838
 ]


Jeff Eastman commented on MAHOUT-1080:
--------------------------------------

Using the WritableComparable key from the vector input file as an identifier 
certainly seems reasonable. We introduced NamedVectors a long time ago to allow 
for identifiers to pass through the clustering classification phase and most 
current Mahout applications take this approach. I'm not sure a new writable 
needs to be introduced here. We could also modify the 
ClusterClassificationMapper to emit a NamedVector with the key in it if the 
VectorWritable was not already named.
                
> Kmeans clustered output losses vectorId given in the input
> ----------------------------------------------------------
>
>                 Key: MAHOUT-1080
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1080
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.7
>            Reporter: Smita Wadhwa
>             Fix For: 0.8
>
>         Attachments: kMeansClusterVectorId.diff
>
>
> The input to the Kmeans is Intwritable and vectorWritable 
> and the output of clustered points is clusterId 
> WeightedVectorWitable(vector,distance-from-the-centre)
> The information the id of the vector is lost in this processing . 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1080) Kmeans clustered output losses vectorId given in the input

Reply via email to