[ 
https://issues.apache.org/jira/browse/MAHOUT-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018122#comment-13018122
 ] 

Jeff Eastman commented on MAHOUT-552:
-------------------------------------

The initial processing step (createCanopyFromVectors) was using the default 
MeanShiftCanopy constructor which inherited the AbstractCanopy default 
constructor which converts all centers to RandomAccessSparseVectors. Since the 
clustering (classification) step is done from these initial canopies rather 
than from the original input vectors, this resulted in the type of the incoming 
vectors to be lost. This is especially problematic when the input vector is 
NamedVector.

I've created a static method initialCanopy() to use for this initial step which 
retains the original input vector center type. I've added a unit test and 
verified that the type is retained. Committing shortly.

> AbstractCluster eliminates NamedVectors by replacing them with 
> RandomAccessSparseVector always
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-552
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-552
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Pere Ferrera Bertran
>            Assignee: Jeff Eastman
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: MAHOUT-552.patch
>
>
> When clustering using NamedVectors as input - after running seq2sparse with 
> patch https://issues.apache.org/jira/browse/MAHOUT-401 - names are lost 
> because AbstractCluster replaces vectors coming in the constructor with 
> RandomAccessSparseVector.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to