[jira] Commented: (MAHOUT-401) Use NamedVector in seq2sparse

Drew Farris (JIRA) Fri, 02 Jul 2010 05:31:24 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884656#action_12884656
 ]


Drew Farris commented on MAHOUT-401:
------------------------------------

bq. So does this mean that seq2sparse will always put out NamedVectors? What 
about when there is no name desired or needed? Is it set to be optional? 

It's not optional at this point, but that's certainly a reasonable thing to do. 
I'll see what kind of patch I can get together for this.

> Use NamedVector in seq2sparse
> -----------------------------
>
>                 Key: MAHOUT-401
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-401
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: Drew Farris
>            Assignee: Drew Farris
>             Fix For: 0.4
>
>         Attachments: MAHOUT-401.patch, pv.patch
>
>
> In seq2sparse, TFIDFPartialVectorReducer and TFPartialVectorReducer should 
> write NamedVectors. It appears that a lack of labels on the vector input to 
> k-means at least breaks the cluster-dumper in the sense that it no longer 
> prints the original document ids for points.
> See: 
> http://lucene.472066.n3.nabble.com/where-are-the-points-in-each-cluster-kmeans-clusterdump-td838683.html#a845600
> I wonder if this is also an issue with the code that generates vectors from 
> lucene indexes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-401) Use NamedVector in seq2sparse

Reply via email to