Re: [jira] Updated: (MAHOUT-137) Convert Clustering Algs to use Vector Writable

Jeff Eastman Mon, 22 Jun 2009 16:10:55 -0700

Looks like you missed Sean's am commit to Vector Cloneable but otherwise the patch applied cleanly.

The stuff after BufferedReader looks to be comparing expected reducer output with actual. Not very readable tho.

From my performance test, the optimization I added to AbstractVector to cache the class and the subset optimization you added in vectorNameToVector are not justified. I ran 100k iterations of serializing/deserializing small vectors with and without my optimization and the performance was indistinguishable. I conclude it is being cached already by the jdk.

I'd suspect Writable identity issues in your test code but I can't find it. It's plaguing me big time with MeanShift.


I'm going to let my brain unwind for a while and try again.


Grant Ingersoll (JIRA) wrote:

     [ 
https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-137:
-----------------------------------

    Attachment: MAHOUT-137.patch

Draft of KMeans conversion. Most tests pass except testKMeansReducer and testKMeansMRJob.

In reading the testKMeansMRJob() it is not clear to me what that last part of 
the test is doing (after the BufferedReader)

As for the Reducer test, I'm not sure why the Centers aren't matching up.

Some extra eyes would be appreciated.

Convert Clustering Algs to use Vector Writable
----------------------------------------------

                Key: MAHOUT-137
                URL: https://issues.apache.org/jira/browse/MAHOUT-137
            Project: Mahout
         Issue Type: Improvement
           Reporter: Grant Ingersoll
           Assignee: Grant Ingersoll
            Fix For: 0.2

        Attachments: MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch, 
MAHOUT-137.patch


All M/R jobs should use Vector writable instead of encoding and decoding 
strings.  We can have a separate utility that converts serialized GSON, 
Strings, whatever into the appropriate vectors.  See MAHOUT-136 and 
http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable

PGP.sig
Description: PGP signature

Re: [jira] Updated: (MAHOUT-137) Convert Clustering Algs to use Vector Writable

Reply via email to