The stuff after BufferedReader looks to be comparing expected reducer output with actual. Not very readable tho.
From my performance test, the optimization I added to AbstractVector to cache the class and the subset optimization you added in vectorNameToVector are not justified. I ran 100k iterations of serializing/deserializing small vectors with and without my optimization and the performance was indistinguishable. I conclude it is being cached already by the jdk.
I'd suspect Writable identity issues in your test code but I can't find it. It's plaguing me big time with MeanShift.
I'm going to let my brain unwind for a while and try again. Grant Ingersoll (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-137: ----------------------------------- Attachment: MAHOUT-137.patchDraft of KMeans conversion. Most tests pass except testKMeansReducer and testKMeansMRJob.In reading the testKMeansMRJob() it is not clear to me what that last part of the test is doing (after the BufferedReader) As for the Reducer test, I'm not sure why the Centers aren't matching up. Some extra eyes would be appreciated.Convert Clustering Algs to use Vector Writable ---------------------------------------------- Key: MAHOUT-137 URL: https://issues.apache.org/jira/browse/MAHOUT-137 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.2 Attachments: MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch All M/R jobs should use Vector writable instead of encoding and decoding strings. We can have a separate utility that converts serialized GSON, Strings, whatever into the appropriate vectors. See MAHOUT-136 and http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable
PGP.sig
Description: PGP signature
