[jira] [Issue Comment Edited] (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

Dmitriy Lyubimov (JIRA) Fri, 03 Jun 2011 13:15:42 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044010#comment-13044010
 ]


Dmitriy Lyubimov edited comment on MAHOUT-537 at 6/3/11 8:14 PM:
-----------------------------------------------------------------

Second Jake. 
bq.  I think the better solution at this point is to move to Hadoop 0.21 as 
part of the next release. 

-1 on this yet. (if i can recollect, Ted had concern about this move as well). 

At the risk sounding like a stuck record, nobody is using 0.21 that i know. 
0.21 is not production grade which was recognized even by the Hadoop team. 

It is true 0.21 is a superset of CDH but it potentially has stuff CDH doesn't 
have so using 0.21 does not guarantee everything will work with CDH and it 
almost certainly guarantees nothing will work for bulk stuff on EMR. 

We use both EMR and CDH. If you puff up the dependencies, as things are now, it 
will absolutely preclude us from using further versions of Mahout. I probably 
could maneuver some code that we use with CDH to verify it still works with CDH 
but not en masse. If i really wanted to use some of such migrated algorithms 
and take advantage of various fixes, i would have to create massive private 
hacks to keep it working (similar to what Cloudera does). Which we probably 
don't have capacity to do, *so i'll just have to drop using trunk or future 
Mahout distributions until better times.* 

*I know for sure we will never use 0.21 they way it is released.*

There's probably more hope for new generation of hadoop that would combine 
ability to run old MR or new MR or something else. In fact, I am looking 
forward to porting and using that future Hadoop generation work as it would 
allow to scrap many unnecessary limitations of MR for parallel use that are 
holding up performance on many algorithms (esp. lin alg algorithms). 

      was (Author: dlyubimov):
    Second Jake. -1 on this yet. (if i can recollect, Ted had concern about 
this move as well). 

At the risk sounding like a stuck record, nobody is using 0.21 that i know. 
0.21 is not production grade which was recognized even by the Hadoop team. 

It is true 0.21 is a superset of CDH but it potentially has stuff CDH doesn't 
have so using 0.21 does not guarantee everything will work with CDH and it 
almost certainly guarantees nothing will work for bulk stuff on EMR. 

We use both EMR and CDH. If you puff up the dependencies, as things are now, it 
will absolutely preclude us from using further versions of Mahout. I probably 
could maneuver some code that we use with CDH to verify it still works with CDH 
but not en masse. If i really wanted to use some of such migrated algorithms 
and take advantage of various fixes, i would have to create massive private 
hacks to keep it working (similar to what Cloudera does). Which we probably 
don't have capacity to do, *so i'll just have to drop using trunk or future 
Mahout distributions until better times.* 

*I know for sure we will never use 0.21 they way it is released.*

There's probably more hope for new generation of hadoop that would combine 
ability to run old MR or new MR or something else. In fact, I am looking 
forward to porting and using that future Hadoop generation work as it would 
allow to scrap many unnecessary limitations of MR for parallel use that are 
holding up performance on many algorithms (esp. lin alg algorithms). 
  
> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
>                 Key: MAHOUT-537
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-537
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.4, 0.5
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>             Fix For: 0.6
>
>         Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch, 
> MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, 
> in particular eliminate dependence on the deprecated JobConf, using instead 
> the separate Job and Configuration objects.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

Reply via email to