What happens with '--stripQuoted true'? Without this the vectors are spammed with redundant text.
On Tue, Jun 5, 2012 at 7:05 AM, Hudson (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289441#comment-13289441 > ] > > Hudson commented on MAHOUT-939: > ------------------------------- > > Integrated in Mahout-Quality #1530 (See > [https://builds.apache.org/job/Mahout-Quality/1530/]) > MAHOUT-939 Remove warnings from the asf example script (Revision 1346373) > > Result = SUCCESS > robinanil : > http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1346373 > Files : > * > /mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/ComplementaryNaiveBayesClassifier.java > * /mahout/trunk/examples/bin/asf-email-examples.sh > > >> ASF Email Classification Examples don't always produce good results >> ------------------------------------------------------------------- >> >> Key: MAHOUT-939 >> URL: https://issues.apache.org/jira/browse/MAHOUT-939 >> Project: Mahout >> Issue Type: Bug >> Affects Versions: 0.6 >> Reporter: Grant Ingersoll >> Assignee: Grant Ingersoll >> Labels: MAHOUT_INTRO_CONTRIBUTE >> Fix For: 0.7 >> >> Attachments: 939.patch, MAHOUT-939.patch, MAHOUT-939.patch, >> MAHOUT-939.patch, asf_sample_list.txt, bayes.patch, strip_reject.patch >> >> >> The classification examples for the ASF email don't work all that well >> currently in terms of quality when it comes to more than a few labels. >> Also, need to determine how much memory is required for vectors of >> cardinality size 100K. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > > -- Lance Norskog [email protected]
