minDF and maxDFPercent filtering doesnt get applied when output weight is tf in
SpareVecorsFromSequenceFile
-----------------------------------------------------------------------------------------------------------
Key: MAHOUT-962
URL: https://issues.apache.org/jira/browse/MAHOUT-962
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.6
Reporter: John Conwell
Fix For: 0.6
This is similar to the same reasoning behind the fix for MAHOUT-957. The
desired output is term frequency vectors, but I want terms filtered by their
min and max DF values. This might be valid in LDA, where tf vectors is desired
for input, but filtering out the maxDFPercent is also useful.
Currently minDF and maxDFPercent are only used when calculating tfidf, and the
original tv vectors are not updated to represent the term filtering.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira