[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tang shanjiang updated MAPREDUCE-5643:
--------------------------------------

    Description: 
Hadoop MRv1 uses the slot-based resource model with the static configuration of 
map/reduce slots. There is a strict utility constrain that map tasks can only 
run on map slots and reduce tasks can only use reduce slots. Due to the rigid 
execution order between map and reduce tasks in a MapReduce environment, slots 
can be severely under-utilized, which significantly degrades the performance. 

In contrast to YARN that gives up the slot-based resource model  and propose a 
container-based model that can maximize the resource utilization via 
unawareness of the types of map/reduce tasks, we keep the slot-based model and 
propose a dynamic slot utilization optimization system called DynamicMR to 
improve the performance of Hadoop by maximizing the slots utilization and 
improving utilization efficiency while guaranteeing the fairness across pools. 
It consists of three levels of scheduling components, namely, Dynamic Hadoop 
Fair Scheduler (DHFS), Dynamic Speculative Task Scheduler (DSTS), and Data 
Locality Maximization Scheduler (DLMS).

Our tests show that DynamicMR outperforms YARN for MapReduce workloads with 
multiple jobs, especially when the number of jobs is large. The explanation is 
that, given a certain number of resources, it is obvious that the performance 
for the case with a ratio control of concurrently running map and reduce tasks 
is better than without control. Because without control, it easily occurs that 
there are too many reduce tasks running, causing the network to be a bottleneck 
seriously. For YARN, both map and reduce tasks can run on any idle container. 
There is no control mechanism for the ratio of resource allocation between map 
and reduce tasks. It means that when there are pending reduce tasks, the idle 
container will be most likely possessed by them. In contrast, DynamicMR follows 
the traditional slot-based model. In contrast to the ’hard’ constrain of slot 
allocation that map slots have to be allocated to map tasks and reduce tasks 
should be dispatched to reduce tasks, DynamicMR obeys a ’soft’ constrain of 
slot allocation to allow that map slot can be allocated to reduce task and vice 
versa. But whenever there are pending map tasks, the map slot should be given 
to map tasks first, and the rule is similar for reduce tasks. It means that, 
the traditional way of static map/reduce slot configuration for the ratio 
control of running map/reduce tasks still works for DynamicMR. In comparison to 
YARN which maximizes the resource utilization only, DynamicMR can maximize the 
slot resource utilization and meanwhile dynamically control the ratio of 
running map/reduce tasks via map/reduce slot configuration.

  was:
Hadoop MRv1 uses the slot-based resource model with the static configuration of 
map/reduce slots. There is a strict utility constrain that map tasks can only 
run on map slots and reduce tasks can only use reduce slots. Due to the rigid 
execution order between map and reduce tasks in a MapReduce environment, slots 
can be severely under-utilized, which significantly degrades the performance. 

In contrast to YARN that gives up the slot-based resource model to maximize 
resource utilization, we keep the slot-based model and propose a dynamic slot 
utilization optimization system called DynamicMR to improve the performance of 
Hadoop by maximizing the slots utilization and improving utilization efficiency 
while guaranteeing the fairness across pools. It consists of three levels of 
scheduling components, namely, Dynamic Hadoop Fair Scheduler (DHFS), Dynamic 
Speculative Task Scheduler (DSTS), and Data Locality Maximization Scheduler 
(DLMS).

Our tests show that DynamicMR outperforms YARN for MapReduce workloads with 
multiple jobs, especially when the number of jobs is large. The explanation is 
that, given a certain number of resources, it is obvious that the performance 
for the case with a ratio control of concurrently running map and reduce tasks 
is better than without control. Because without control, it easily occurs that 
there are too many reduce tasks running, causing the network to be a bottleneck 
seriously. For YARN, both map and reduce tasks can run on any idle container. 
There is no control mechanism for the ratio of resource allocation between map 
and reduce tasks. It means that when there are pending reduce tasks, the idle 
container will be most likely possessed by them. In contrast, DynamicMR follows 
the traditional slot-based model. In contrast to the ’hard’ constrain of slot 
allocation that map slots have to be allocated to map tasks and reduce tasks 
should be dispatched to reduce tasks, DynamicMR obeys a ’soft’ constrain of 
slot allocation to allow that map slot can be allocated to reduce task and vice 
versa. But whenever there are pending map tasks, the map slot should be given 
to map tasks first, and the rule is similar for reduce tasks. It means that, 
the traditional way of static map/reduce slot configuration for the ratio 
control of running map/reduce tasks still works for DynamicMR. In comparison to 
YARN which maximizes the resource utilization only, DynamicMR can maximize the 
slot resource utilization and meanwhile dynamically control the ratio of 
running map/reduce tasks via map/reduce slot configuration.


> DynamicMR: A Dynamic Slot Utilization Optimization Framework for Hadoop MRv1
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5643
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5643
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>    Affects Versions: 1.2.1
>            Reporter: tang shanjiang
>            Assignee: tang shanjiang
>              Labels: performance
>         Attachments: DynamicMR-0.1.1-patch, README
>
>
> Hadoop MRv1 uses the slot-based resource model with the static configuration 
> of map/reduce slots. There is a strict utility constrain that map tasks can 
> only run on map slots and reduce tasks can only use reduce slots. Due to the 
> rigid execution order between map and reduce tasks in a MapReduce 
> environment, slots can be severely under-utilized, which significantly 
> degrades the performance. 
> In contrast to YARN that gives up the slot-based resource model  and propose 
> a container-based model that can maximize the resource utilization via 
> unawareness of the types of map/reduce tasks, we keep the slot-based model 
> and propose a dynamic slot utilization optimization system called DynamicMR 
> to improve the performance of Hadoop by maximizing the slots utilization and 
> improving utilization efficiency while guaranteeing the fairness across 
> pools. It consists of three levels of scheduling components, namely, Dynamic 
> Hadoop Fair Scheduler (DHFS), Dynamic Speculative Task Scheduler (DSTS), and 
> Data Locality Maximization Scheduler (DLMS).
> Our tests show that DynamicMR outperforms YARN for MapReduce workloads with 
> multiple jobs, especially when the number of jobs is large. The explanation 
> is that, given a certain number of resources, it is obvious that the 
> performance for the case with a ratio control of concurrently running map and 
> reduce tasks is better than without control. Because without control, it 
> easily occurs that there are too many reduce tasks running, causing the 
> network to be a bottleneck seriously. For YARN, both map and reduce tasks can 
> run on any idle container. There is no control mechanism for the ratio of 
> resource allocation between map and reduce tasks. It means that when there 
> are pending reduce tasks, the idle container will be most likely possessed by 
> them. In contrast, DynamicMR follows the traditional slot-based model. In 
> contrast to the ’hard’ constrain of slot allocation that map slots have to be 
> allocated to map tasks and reduce tasks should be dispatched to reduce tasks, 
> DynamicMR obeys a ’soft’ constrain of slot allocation to allow that map slot 
> can be allocated to reduce task and vice versa. But whenever there are 
> pending map tasks, the map slot should be given to map tasks first, and the 
> rule is similar for reduce tasks. It means that, the traditional way of 
> static map/reduce slot configuration for the ratio control of running 
> map/reduce tasks still works for DynamicMR. In comparison to YARN which 
> maximizes the resource utilization only, DynamicMR can maximize the slot 
> resource utilization and meanwhile dynamically control the ratio of running 
> map/reduce tasks via map/reduce slot configuration.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to