[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288332#comment-13288332
 ] 

Amar Kamat commented on MAPREDUCE-4305:
---------------------------------------

Mayank,
I assume that you are using Hadoop 0.22. The numbers that we are seeing (on 
0.20.x) is different from what you have reported. IIRC, Hadoop 22 code is still 
the old Hadoop codebase (compared to 0.23/trunk) and should be similar to 
Hadoop 0.20. Can you re-run your experiments on 0.20.x (i.e branch 1.x) and 
share your finding?
                
> Implement delay scheduling in capacity scheduler for improving data locality
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4305
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4305
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>
> Capacity Scheduler data local tasks are about 40%-50% which is not good.
> While my test with 70 node cluster i consistently get data locality around 
> 40-50% on a free cluster.
> I think we need to implement something like delay scheduling in the capacity 
> scheduler for improving the data locality.
> http://radlab.cs.berkeley.edu/publication/308
> After implementing the delay scheduling on Hadoop 22 I am getting 100 % data 
> locality in free cluster and around 90% data locality in busy cluster.
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to