[ 
https://issues.apache.org/jira/browse/HDFS-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938470#comment-16938470
 ] 

Stephen O'Donnell commented on HDFS-14872:
------------------------------------------

How would you see this working? The client downloads each block and stores it 
as a temp file and then concatenates all the pieces when it has all the blocks?

Its worth nothing that if a client is a map-reduce job, the blocks are read 
randomly by each split so this does spread the load. For a normal 3 replica 
file, the node picked to read from should be random across the 3 nodes, so the 
load is already spread over 3 nodes. If some files are hot, they could be given 
a higher replication factor to help further.

> Read HDFS Blocks in Random Order
> --------------------------------
>
>                 Key: HDFS-14872
>                 URL: https://issues.apache.org/jira/browse/HDFS-14872
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs-client
>    Affects Versions: 2.8.5, 3.2.1
>            Reporter: David Mollitor
>            Priority: Major
>
> When the HDFS client is downloading (copying) an entire file, allow the 
> client to download the blocks in random order.  If a lot of clients are 
> reading the same file, in parallel, they will all download the first block, 
> the second block, and so on, stampeding down the line.
> It would be interesting to spread the load across across all the available 
> DataNodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to