minDF and maxDFPercent filtering doesnt get applied when output weight is tf in 
SpareVecorsFromSequenceFile
-----------------------------------------------------------------------------------------------------------

                 Key: MAHOUT-962
                 URL: https://issues.apache.org/jira/browse/MAHOUT-962
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.6
            Reporter: John Conwell
             Fix For: 0.6


This is similar to the same reasoning behind the fix for MAHOUT-957.  The 
desired output is term frequency vectors, but I want terms filtered by their 
min and max DF values. This might be valid in LDA, where tf vectors is desired 
for input, but filtering out the maxDFPercent is also useful.

Currently minDF and maxDFPercent are only used when calculating tfidf, and the 
original tv vectors are not updated to represent the term filtering.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to