[ 
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shannon Quinn updated MAHOUT-537:
---------------------------------

    Attachment: MAHOUT-537_hack.patch

Ok, this is absolutely a total hack job, but I wanted to see if it would work: 
taking the 0.21 mapreduce.lib.join* package, tweaking it slightly to make it 
0.20-compatible, and installing it directly in Mahout to make 
DistributedRowMatrix 0.20-compliant.

It and the associated tests compile, but I've run into a problem of failing 
tests, the cause of which seems to be that it won't write files to 
DistributedCache, HDFS, etc. I tried writing to DistributedCache and 
immediately reading it back--which worked fine--but otherwise I'm stuck and 
could use some help.

If this isn't an avenue worth pursuing, that's also fine. I had the idea and 
wanted to give it a shot before throwing in the towel and waiting for 0.22.

> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
>                 Key: MAHOUT-537
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-537
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.4, 0.5
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>             Fix For: 0.6
>
>         Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch, 
> MAHOUT-537.patch, MAHOUT-537_hack.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, 
> in particular eliminate dependence on the deprecated JobConf, using instead 
> the separate Job and Configuration objects.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to