Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

Grant Ingersoll Fri, 19 Jun 2009 18:28:37 -0700

So, are we to make these changes on all the Mappers/Reducers?




On Jun 19, 2009, at 8:54 PM, Jeff Eastman (JIRA) wrote:

[ https://issues.apache.org/jira/browse/MAHOUT-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722105#action_12722105 ]
Jeff Eastman commented on MAHOUT-136:
-------------------------------------

r786738 committed the following changes.
- Modified CanopyMapper and CanopyReducer to produce and consumeCanopy centroids as Writable values vs. previous formatStrings
- Modified CanopyMapper to specify SparseVector output from mapper
- Fixed null name hash() bug in SparseVector
- Modified Canopy.emitPointToExistingCanopies to emit only canopy idand not full serialized canopy.- This eliminates the need for the OutputDriver and OutputMapper insynthetic control example so they are deleted.
- Updated unit tests; all tests run
- Synthetic control example runs
NOTE: When passing Vectors between Map and Reduce steps usingWritable format, Hadoop uses the *same instance* to do all of thedeserializations. I had to change the Canopy constructors to clone()their center arguments so that the same instance would not be reusedfor multiple canopies.
Change Canopy MR Implementation to use Vector Writable
------------------------------------------------------

               Key: MAHOUT-136
               URL: https://issues.apache.org/jira/browse/MAHOUT-136
           Project: Mahout
        Issue Type: Improvement
        Components: Clustering
  Affects Versions: 0.1
          Reporter: Jeff Eastman
          Assignee: Jeff Eastman
           Fix For: 0.1
Internal serialization of Canopy currently uses asFormatStringrather than just making the Canopy writable. This is storageinefficient.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

Reply via email to