[
https://issues.apache.org/jira/browse/HADOOP-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560037#action_12560037
]
Raghu Angadi commented on HADOOP-2094:
--------------------------------------
Random partition is fine and patch looks fine. If there are two writers, there
is 25% probability that both write to the same partition. with 3, it becomes
62.5% (that 2 are more writing the same disk) 90% for 4 etc.. If that is ok,
then this patch is fine. Assuming typically these apps are IO bound, this
sounds pretty large panalty.
But I don't know how it fixes problems reported in the description.. actually I
did not quite understand the problem any way.
> DFS should not use round robin policy in determing on which volume (file
> system partition) to allocate for the next block
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2094
> URL: https://issues.apache.org/jira/browse/HADOOP-2094
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Reporter: Runping Qi
> Assignee: dhruba borthakur
> Attachments: randomDatanodePartition.patch
>
>
> When multiple file system partitions are configured for the data storage of a
> data node,
> it uses a strict round robin policy to decide which partition to use for
> writing the next block.
> This may result in anormaly cases in which the blocks of a file are not
> evenly distributed across
> the partitions. For example, when we use distcp to copy files with each node
> have 4 mappers running concurrently,
> those 4 mappers are writing to DFS at about the same rate. Thus, it is
> possible that the 4 mappers write out
> blocks interleavingly. If there are 4 file system partitions configured for
> the local data node, it is possible that each mapper will
> continue to write its blocks on to the same file system partition.
> A simple random placement policy will avoid such anormaly cases, and does not
> have any obvious drawbacks.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.