Benchmark Mahout's clustering performance on EC2 and publish the results
------------------------------------------------------------------------

                 Key: MAHOUT-588
                 URL: https://issues.apache.org/jira/browse/MAHOUT-588
             Project: Mahout
          Issue Type: Task
            Reporter: Grant Ingersoll


For Taming Text, I've commissioned some benchmarking work on Mahout's 
clustering algorithms.  I've asked the two doing the project to do all the work 
in the open here.  The goal is to use a publicly reusable dataset (for now, the 
ASF mail archives, assuming it is big enough) and run on EC2 and make all 
resources available so others can reproduce/improve.

I'd like to add the setup code to utils (although it could possibly be done as 
a Vectorizer) and the publication of the results will be put up on the Wiki as 
well as in the book.  This issue is to track the patches, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to