[ https://issues.apache.org/jira/browse/MAHOUT-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018122#comment-13018122 ]
Jeff Eastman commented on MAHOUT-552: ------------------------------------- The initial processing step (createCanopyFromVectors) was using the default MeanShiftCanopy constructor which inherited the AbstractCanopy default constructor which converts all centers to RandomAccessSparseVectors. Since the clustering (classification) step is done from these initial canopies rather than from the original input vectors, this resulted in the type of the incoming vectors to be lost. This is especially problematic when the input vector is NamedVector. I've created a static method initialCanopy() to use for this initial step which retains the original input vector center type. I've added a unit test and verified that the type is retained. Committing shortly. > AbstractCluster eliminates NamedVectors by replacing them with > RandomAccessSparseVector always > ---------------------------------------------------------------------------------------------- > > Key: MAHOUT-552 > URL: https://issues.apache.org/jira/browse/MAHOUT-552 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.5 > Reporter: Pere Ferrera Bertran > Assignee: Jeff Eastman > Priority: Minor > Fix For: 0.5 > > Attachments: MAHOUT-552.patch > > > When clustering using NamedVectors as input - after running seq2sparse with > patch https://issues.apache.org/jira/browse/MAHOUT-401 - names are lost > because AbstractCluster replaces vectors coming in the constructor with > RandomAccessSparseVector. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira