[jira] Updated: (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

Shannon Quinn (JIRA) Thu, 06 Jan 2011 10:32:33 -0800

     [ 
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shannon Quinn updated MAHOUT-537:
---------------------------------

    Attachment: MAHOUT-537.patch

Attached is the patch without the custom Writable I wrote, instead using 
NamedVector.

It seems (to me) that there are two options for eliminating the two extra M/R 
tasks I had to create in lieu of the CompositeInputFormat's joins:

1) Have each row of a DistributedRowMatrix labeled when it is first created. 
Since DRM isn't much more than a glorified wrapper, its constructor can't 
implement something like this, so this would be infeasible from a scope 
perspective.
2) Guarantee the ordering of two given rows in the Iterable object of a 
Combiner/Reducer, so we know one of them belongs to the multiplicand, the other 
to the multiplier.

Option #2 seems most technically feasible, however my limited understanding of 
the inner workings of Hadoop prevents me from knowing where to start. I've 
taken a look at Partitioner, RecordReader, and various InputFormats and they 
haven't given me any intuition. Any thoughts on how to do this? Or another 
method entirely?

> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
>                 Key: MAHOUT-537
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-537
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>         Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch, 
> MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, 
> in particular eliminate dependence on the deprecated JobConf, using instead 
> the separate Job and Configuration objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

Reply via email to