[
https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mat Kelcey updated MAHOUT-695:
------------------------------
Attachment: mahout-695.patch
Have removed NUM_WORDS option completely which will break existing callers
since it makes it an unknown parameter. (Not sure if backwards compability is
an issue at this stage) Am happy to reinclude code to ignore it with a warning
message that it's deprecated.
> Option to determine number of words for LDADriver from a specified dictionary
> -----------------------------------------------------------------------------
>
> Key: MAHOUT-695
> URL: https://issues.apache.org/jira/browse/MAHOUT-695
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.5
> Reporter: Mat Kelcey
> Assignee: Jake Mannix
> Priority: Minor
> Fix For: 0.6
>
> Attachments: mahout-695.patch, mahout-695.patch
>
>
> It bugged me that you needed to specify the number of words directly to the
> LDADriver
> eg ./bin/mahout lda \
> -i ./examples/bin/work/reuters-out-seqdir-sparse/tf-vectors \
> -o ./examples/bin/work/reuters-lda -k 20 -v 50000 -ow -x 20
> with this patch the ldadriver just checks a vector from the input to
> determine the size
> eg ./bin/mahout lda \
> -i ./examples/bin/work/reuters-out-seqdir-sparse/tf-vectors \
> -o ./examples/bin/work/reuters-lda -k 20 -ow -x 20
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira