Grant Ingersoll (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723067#action_12723067 ]Grant Ingersoll commented on MAHOUT-137: ----------------------------------------I see the problem now with KMeans (and likely Fuzzy KMeans, and it is a source of confusion. Namely, it's the whole relationship between Cluster.center and Cluster.centroid. It seems as the Cluster goes from formatCluster through decodeCluster the centroid (computed in formatCluster) then becomes the center for the next time around. In the testKMeansReducer, this never happens since we aren't serializing through the string layer.Obviously, I can correct this in the test, but it seems a bit strange. AIUI, the center holds the current iteration center and it seems like the centroid is the result of where the center is being moved to, right? This does indeed happen in my implementation of Writable, but since that isn't being called in the test, it doesn't occur.Convert Clustering Algs to use Vector Writable ---------------------------------------------- Key: MAHOUT-137 URL: https://issues.apache.org/jira/browse/MAHOUT-137 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.2 Attachments: MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch All M/R jobs should use Vector writable instead of encoding and decoding strings. We can have a separate utility that converts serialized GSON, Strings, whatever into the appropriate vectors. See MAHOUT-136 and http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable
PGP.sig
Description: PGP signature
