[
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976244#action_12976244
]
Sean Owen commented on MAHOUT-537:
----------------------------------
I think this is a great effort. I think it's essential that the project remain
attached to 0.20.2 at the moment because I believe many people will want to use
it with Amazon EMR which is on 0.20.2. We still have some stuff written for
0.19.x and it's higher priority to move off that than onto 0.21.x I think.
Complicating this is the fact that 0.21.x is not backward compatible with
0.20.x.
NamedVector is already supported in VectorWritable, do we need a new Writable?
Is the issue that you are doing joins? Without CompositeInputFormat it's still
possible, and we use the pattern elsewhere. You need some cleverness with a
custom key and partitioner that will send key x from source A and key x from
source B to the same reducer while maintaining inside a bit that indicates
whether it's from A or B.
> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
> Key: MAHOUT-537
> URL: https://issues.apache.org/jira/browse/MAHOUT-537
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 0.4
> Reporter: Shannon Quinn
> Assignee: Shannon Quinn
> Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API,
> in particular eliminate dependence on the deprecated JobConf, using instead
> the separate Job and Configuration objects.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.