DFS should not use round robin policy in determing on which volume (file system
partition) to allocate for the next block
--------------------------------------------------------------------------------------------------------------------------
Key: HADOOP-2094
URL: https://issues.apache.org/jira/browse/HADOOP-2094
Project: Hadoop
Issue Type: Improvement
Components: dfs
Reporter: Runping Qi
Assignee: Runping Qi
When multiple file system partitions are configured for the data storage of a
data node,
it uses a strict round robin policy to decide which partition to use for
writing the next block.
This may result in anormaly cases in which the blocks of a file are not evenly
distributed across
the partitions. For example, when we use distcp to copy files with each node
have 4 mappers running concurrently,
those 4 mappers are writing to DFS at about the same rate. Thus, it is possible
that the 4 mappers write out
blocks interleavingly. If there are 4 file system partitions configured for the
local data node, it is possible that each mapper will
continue to write its blocks on to the same file system partition.
A simple random placement policy will avoid such anormaly cases, and does not
have any obvious drawbacks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.