[
https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722708#action_12722708
]
Jeff Eastman commented on MAHOUT-137:
-------------------------------------
Yes, I saw that and that was my original approach too. I do like the ability to
have the clustering jobs be vector-type agnostic and pushing it into an
argument does work. On output, we still need it as a job argument since we need
to know the type at config-time. This also allows us to use the same internal
form between mapper and reducer steps in a clustering. I agree users would not
like to have to worry about specifying it if we could avoid it, maybe that's
the real question for core-user.
I also think it is unlikely that a given application of clustering would mix
sparse and dense vectors though it would allow us to make the particular
encoding be automatic on a per-instance basis. Using the optimized
AbstractVector methods on input would add a little storage overhead to the
input data but would allow this flexibility.
> Convert Clustering Algs to use Vector Writable
> ----------------------------------------------
>
> Key: MAHOUT-137
> URL: https://issues.apache.org/jira/browse/MAHOUT-137
> Project: Mahout
> Issue Type: Improvement
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Fix For: 0.2
>
> Attachments: MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch
>
>
> All M/R jobs should use Vector writable instead of encoding and decoding
> strings. We can have a separate utility that converts serialized GSON,
> Strings, whatever into the appropriate vectors. See MAHOUT-136 and
> http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.