[ 
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976244#action_12976244
 ] 

Sean Owen commented on MAHOUT-537:
----------------------------------

I think this is a great effort. I think it's essential that the project remain 
attached to 0.20.2 at the moment because I believe many people will want to use 
it with Amazon EMR which is on 0.20.2. We still have some stuff written for 
0.19.x and it's higher priority to move off that than onto 0.21.x I think. 
Complicating this is the fact that 0.21.x is not backward compatible with 
0.20.x.

NamedVector is already supported in VectorWritable, do we need a new Writable?

Is the issue that you are doing joins? Without CompositeInputFormat it's still 
possible, and we use the pattern elsewhere. You need some cleverness with a 
custom key and partitioner that will send key x from source A and key x from 
source B to the same reducer while maintaining inside a bit that indicates 
whether it's from A or B.

> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
>                 Key: MAHOUT-537
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-537
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>         Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, 
> in particular eliminate dependence on the deprecated JobConf, using instead 
> the separate Job and Configuration objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to