[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995408#comment-12995408
 ] 

Jordà Polo commented on MAPREDUCE-1380:
---------------------------------------

I'm sending a new version of the Adaptive Scheduler.

This new version is actually a new implementation with a different architecture 
roughly described in the attached PDF document. It supports the same features 
as the previous version, but at the same time provides new features and a 
framework for future improvements.

The new features are mostly focused on making the scheduler more aware of the 
resources and allowing a dynamic number of running tasks depending on the jobs 
and their need for resources (instead of a fixed number of slots).

It is still a work in progress and requires some additional tuning, but I 
thought it would be interesting to publish it as it is now given some of the 
ideas that have been proposed for Hadoop MapReduce NextGen (MAPREDUCE-279). The 
scheduler currently leverages job profiling information to ensure optimal 
cluster utilization, but our goal is to get rid of this kind of profiles and 
implement a more dynamic approach (e.g. using resource information data 
introduced by MAPREDUCE-1218).

I still don't know what's the status of the "NextGen" proposal and its 
implementation. But as soon as more details about NextGen are revealed we'll 
see whether it makes sense and it is worth/useful to adapt or use some of the 
ideas in the new Hadoop MapReduce architecture.


> Adaptive Scheduler
> ------------------
>
>                 Key: MAPREDUCE-1380
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jordà Polo
>            Priority: Minor
>         Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch
>
>
> The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
> adjusts the amount of used resources depending on the performance of jobs and 
> on user-defined high-level business goals.
> Existing Hadoop schedulers are focused on managing large, static clusters in 
> which nodes are added or removed manually. On the other hand, the goal of 
> this scheduler is to improve the integration of Hadoop and the applications 
> that run on top of it with environments that allow a more dynamic 
> provisioning of resources.
> The current implementation is quite straightforward. Users specify a deadline 
> at job submission time, and the scheduler adjusts the resources to meet that 
> deadline (at the moment, the scheduler can be configured to either minimize 
> or maximize the amount of resources). If multiple jobs are run 
> simultaneously, the scheduler prioritizes them by deadline. Note that the 
> current approach to estimate the completion time of jobs is quite simplistic: 
> it is based on the time it takes to finish each task, so it works well with 
> regular jobs, but there is still room for improvement for unpredictable jobs.
> The idea is to further integrate it with cloud-like and virtual environments 
> (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
> able to meet its deadline, the scheduler automatically requests more 
> resources.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to