Raid should rearrange the replicas while raiding
------------------------------------------------
Key: MAPREDUCE-1861
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
Fix For: 0.22.0
Raided file introduce extra dependencies on the blocks on the same stripe.
Therefore we need a new way to place the blocks.
It is desirable that raided file satisfies the following two conditions:
a. Replicas on the same stripe should be on different machines (or racks)
b. Replicas of the same block should be on different racks
MAPREDUCE-1831 will try to delete the replicas on the same stripe and the same
machine (a).
But in the mean time, it will try to maintain the number of distinct racks of
one block (b).
We cannot satisfy (a) and (b) at the same time with the current logic in
BlockPlacementPolicyDefault.chooseTarget().
One choice we have is to change BlockPlacementPolicyDefault.chooseTarget().
However, this placement is in general good for all files including the unraided
ones.
It is not clear to us that we can make this good for both raided and unraided
files.
So we propose this idea that when raiding the file. We create one more off-rack
replica (so the replication=4 now).
Than we delete two blocks using the policy in MAPREDUCE-1831 after that
(replication=2 now).
This way we can rearrange the replicas to satisfy (a) and (b) at the same time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.