[jira] [Commented] (MAHOUT-695) Option to determine number of words for LDADriver from a specified dictionary

Mat Kelcey (JIRA) Sat, 14 May 2011 23:13:33 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033651#comment-13033651
 ]


Mat Kelcey commented on MAHOUT-695:
-----------------------------------

here's another patch for determining the num words from the first vector. 

i've left numwords option in though as a form of deprecation so a warning can 
be given. the alternate of taking the option out would fail at startup 
complaining about the unknown arg. so depending on how much backwards 
compatibility you're after this might not be needed...

> Option to determine number of words for LDADriver from a specified dictionary
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-695
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-695
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Mat Kelcey
>            Priority: Minor
>         Attachments: mahout-695-sniff-vector.patch, mahout-695.patch, 
> mahout-695.patch
>
>
> It bugged me that you needed to specify the number of words directly to the 
> LDADriver 
> eg ./bin/mahout lda \
>      -i ./examples/bin/work/reuters-out-seqdir-sparse/tf-vectors \
>      -o ./examples/bin/work/reuters-lda -k 20 -v 50000 -ow -x 20 
> with this patch you can instead provide a dictionary; we just count the terms 
> in the dictionary
> eg ./bin/mahout lda \
>      -i ./examples/bin/work/reuters-out-seqdir-sparse/tf-vectors \
>      -o ./examples/bin/work/reuters-lda \
>      -d ./examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 \
>      -k 20 -ow -x 20 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-695) Option to determine number of words for LDADriver from a specified dictionary

Reply via email to