[ 
https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033457#comment-13033457
 ] 

Jake Mannix commented on MAHOUT-695:
------------------------------------

But awesome work, thanks, this is great, I've also been often annoyed by this 
missing feature.  What would be even cooler?  If in case there was no 
dictionary, we just sniff the first vector in the data set, and ask for its 
getSize()!

> Option to determine number of words for LDADriver from a specified dictionary
> -----------------------------------------------------------------------------
>
>                 Key: MAHOUT-695
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-695
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Mat Kelcey
>            Priority: Minor
>         Attachments: mahout-695.patch, mahout-695.patch
>
>
> It bugged me that you needed to specify the number of words directly to the 
> LDADriver 
> eg ./bin/mahout lda \
>      -i ./examples/bin/work/reuters-out-seqdir-sparse/tf-vectors \
>      -o ./examples/bin/work/reuters-lda -k 20 -v 50000 -ow -x 20 
> with this patch you can instead provide a dictionary; we just count the terms 
> in the dictionary
> eg ./bin/mahout lda \
>      -i ./examples/bin/work/reuters-out-seqdir-sparse/tf-vectors \
>      -o ./examples/bin/work/reuters-lda \
>      -d ./examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 \
>      -k 20 -ow -x 20 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to