[ 
https://issues.apache.org/jira/browse/MAHOUT-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760462#action_12760462
 ] 

Jeff Eastman commented on MAHOUT-136:
-------------------------------------

I think this issue has been completed and should be closed, since Canopy does 
now use Vector Writable for communicating the centroid vectors between the 
mapper and reducer. What it does not do, is transmit Writable Canopies between 
the map and reduce steps as kmeans does. There is an implementation of Writable 
methods for Canopy (IMHO it is not correct since it sets the point total and 
count to nonzero values) but the mapper and reducer do not use them so this is 
moot. Converting the mapper and reducer to communicate writable canopies can be 
done but there are a lot of annoying little complications in the driver which 
currently goes to some lengths to use the same vector form (dense, sparse) as 
the input data.

It works as implemented.

Unless somebody strongly disagrees I'm going to close this issue as resolved, 
since the real intent was to replace the text representation of the centroid 
vector with the writable version and that has been done for some time now.

> Change Canopy MR Implementation to use Vector Writable
> ------------------------------------------------------
>
>                 Key: MAHOUT-136
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-136
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.2
>
>
> Internal serialization of Canopy currently uses asFormatString rather than 
> just making the Canopy writable. This is storage inefficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to