[
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044010#comment-13044010
]
Dmitriy Lyubimov edited comment on MAHOUT-537 at 6/3/11 8:14 PM:
-----------------------------------------------------------------
Second Jake.
bq. I think the better solution at this point is to move to Hadoop 0.21 as
part of the next release.
-1 on this yet. (if i can recollect, Ted had concern about this move as well).
At the risk sounding like a stuck record, nobody is using 0.21 that i know.
0.21 is not production grade which was recognized even by the Hadoop team.
It is true 0.21 is a superset of CDH but it potentially has stuff CDH doesn't
have so using 0.21 does not guarantee everything will work with CDH and it
almost certainly guarantees nothing will work for bulk stuff on EMR.
We use both EMR and CDH. If you puff up the dependencies, as things are now, it
will absolutely preclude us from using further versions of Mahout. I probably
could maneuver some code that we use with CDH to verify it still works with CDH
but not en masse. If i really wanted to use some of such migrated algorithms
and take advantage of various fixes, i would have to create massive private
hacks to keep it working (similar to what Cloudera does). Which we probably
don't have capacity to do, *so i'll just have to drop using trunk or future
Mahout distributions until better times.*
*I know for sure we will never use 0.21 they way it is released.*
There's probably more hope for new generation of hadoop that would combine
ability to run old MR or new MR or something else. In fact, I am looking
forward to porting and using that future Hadoop generation work as it would
allow to scrap many unnecessary limitations of MR for parallel use that are
holding up performance on many algorithms (esp. lin alg algorithms).
was (Author: dlyubimov):
Second Jake. -1 on this yet. (if i can recollect, Ted had concern about
this move as well).
At the risk sounding like a stuck record, nobody is using 0.21 that i know.
0.21 is not production grade which was recognized even by the Hadoop team.
It is true 0.21 is a superset of CDH but it potentially has stuff CDH doesn't
have so using 0.21 does not guarantee everything will work with CDH and it
almost certainly guarantees nothing will work for bulk stuff on EMR.
We use both EMR and CDH. If you puff up the dependencies, as things are now, it
will absolutely preclude us from using further versions of Mahout. I probably
could maneuver some code that we use with CDH to verify it still works with CDH
but not en masse. If i really wanted to use some of such migrated algorithms
and take advantage of various fixes, i would have to create massive private
hacks to keep it working (similar to what Cloudera does). Which we probably
don't have capacity to do, *so i'll just have to drop using trunk or future
Mahout distributions until better times.*
*I know for sure we will never use 0.21 they way it is released.*
There's probably more hope for new generation of hadoop that would combine
ability to run old MR or new MR or something else. In fact, I am looking
forward to porting and using that future Hadoop generation work as it would
allow to scrap many unnecessary limitations of MR for parallel use that are
holding up performance on many algorithms (esp. lin alg algorithms).
> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
> Key: MAHOUT-537
> URL: https://issues.apache.org/jira/browse/MAHOUT-537
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.4, 0.5
> Reporter: Shannon Quinn
> Assignee: Shannon Quinn
> Fix For: 0.6
>
> Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch,
> MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API,
> in particular eliminate dependence on the deprecated JobConf, using instead
> the separate Job and Configuration objects.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira