[ https://issues.apache.org/jira/browse/MAHOUT-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935402#action_12935402 ]
Jeff Eastman commented on MAHOUT-552: ------------------------------------- This will not resolve your issue and is incorrect. The cluster centers are initialized from an input vector but subsequent observations of other input vectors will cause this to be recomputed to be the centroid of all the observed vectors. Any significance of retaining the first vector's NamedVector would be lost during this calculation. The cluster centroids are the results of many observations. Not correct to have them be named. I think what you really want is to run the clustering job with the -cl option (not the default). This will compute the clusters into a clusters-n directory and then cluster (classify) all of the input vectors into a clusteredPoints directory. This directory will contain sequence files where the key is a clusterId and the values are WeightedVectorWritables. These will have a weight (1 in k-means & canopy, some value<1 for fuzzyK and Dirichlet) and your initial input vector. If that vector was a NamedVector then the output will also be a NamedVector, preserving your documentId. > AbstractCluster eliminates NamedVectors by replacing them with > RandomAccessSparseVector always > ---------------------------------------------------------------------------------------------- > > Key: MAHOUT-552 > URL: https://issues.apache.org/jira/browse/MAHOUT-552 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.5 > Reporter: Pere Ferrera Bertran > Fix For: 0.5 > > Attachments: MAHOUT-552.patch > > > When clustering using NamedVectors as input - after running seq2sparse with > patch https://issues.apache.org/jira/browse/MAHOUT-401 - names are lost > because AbstractCluster replaces vectors coming in the constructor with > RandomAccessSparseVector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.