[
https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722382#action_12722382
]
Jeff Eastman commented on MAHOUT-137:
-------------------------------------
Evidently, Hadoop needs to know the concrete class so it does not have to
marshall the class name with every instance. It makes sense and is more
efficient but it will require us to be more clever about using DenseVectors. A
job argument would do the trick, and we might want to add another to specify
the Binary/Json output encoding so we don't always have to always do an output
driver step to get something human-readable.
> Convert Clustering Algs to use Vector Writable
> ----------------------------------------------
>
> Key: MAHOUT-137
> URL: https://issues.apache.org/jira/browse/MAHOUT-137
> Project: Mahout
> Issue Type: Improvement
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Fix For: 0.2
>
> Attachments: MAHOUT-137.patch
>
>
> All M/R jobs should use Vector writable instead of encoding and decoding
> strings. We can have a separate utility that converts serialized GSON,
> Strings, whatever into the appropriate vectors. See MAHOUT-136 and
> http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.