[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated MAPREDUCE-6877:
---------------------------------
    Description: It would be good to use SSD in HDFS to improve reading/writing 
performance. However, SSD costs more than HDD, so there is a tradeoff policy 
ONE-SSD to balance the performance and cost. But there occurs a problem whether 
applications will read the replication on SSD or not. If applications wouldn’t 
preferentially read the replication on SSD, the advantage of SSD wouldn’t be 
fully utilized. The current MapReduce only assign tasks according to data 
locality. The storage types of all the replications of each split should also 
be taken into consideration in order to assign map task preferentially to a 
node where its split is located on a faster storage type.  (was: SSD has been 
widely used in HDFS to improve reading/writing performance. However, SSD costs 
much more than HDD, so there is a tradeoff policy ONE-SSD to balance the 
performance and cost. But there occurs a problem whether applications will read 
the replication on SSD. If applications cannot read the replication on SSD, the 
advantage of SSD can no longer be utilized, which will lead to much poorer 
performance compared to ALL-SSD policy. The current MapReduce only assign tasks 
according to data locality. The storage types of all the replications of each 
split should also been taken into consideration in order to assign map task 
preferentially to a node where its split is located on a faster storage type.)

> Assign map task preferentially to the data node where the split is on faster 
> storage type
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6877
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6877
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Tim Yao
>
> It would be good to use SSD in HDFS to improve reading/writing performance. 
> However, SSD costs more than HDD, so there is a tradeoff policy ONE-SSD to 
> balance the performance and cost. But there occurs a problem whether 
> applications will read the replication on SSD or not. If applications 
> wouldn’t preferentially read the replication on SSD, the advantage of SSD 
> wouldn’t be fully utilized. The current MapReduce only assign tasks according 
> to data locality. The storage types of all the replications of each split 
> should also be taken into consideration in order to assign map task 
> preferentially to a node where its split is located on a faster storage type.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to