[jira] Commented: (MAHOUT-137) Convert Clustering Algs to use Vector Writable

Jeff Eastman (JIRA) Mon, 22 Jun 2009 10:48:39 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722708#action_12722708
 ]


Jeff Eastman commented on MAHOUT-137:
-------------------------------------

Yes, I saw that and that was my original approach too. I do like the ability to 
have the clustering jobs be vector-type agnostic and pushing it into an 
argument does work. On output, we still need it as a job argument since we need 
to know the type at config-time. This also allows us to use the same internal 
form between mapper and reducer steps in a clustering. I agree users would not 
like to have to worry about specifying it if we could avoid it, maybe that's 
the real question for core-user. 

I also think it is unlikely that a given application of clustering would mix 
sparse and dense vectors though it would allow us to make the particular 
encoding be automatic on a per-instance basis. Using the optimized 
AbstractVector methods on input would add a little storage overhead to the 
input data but would allow this flexibility. 



> Convert Clustering Algs to use Vector Writable
> ----------------------------------------------
>
>                 Key: MAHOUT-137
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-137
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.2
>
>         Attachments: MAHOUT-137.patch, MAHOUT-137.patch, MAHOUT-137.patch
>
>
> All M/R jobs should use Vector writable instead of encoding and decoding 
> strings.  We can have a separate utility that converts serialized GSON, 
> Strings, whatever into the appropriate vectors.  See MAHOUT-136 and 
> http://www.lucidimagination.com/search/document/6a55f260826fd77f/jira_commented_mahout_136_change_canopy_mr_implementation_to_use_vector_writable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-137) Convert Clustering Algs to use Vector Writable

Reply via email to