[
https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597711#comment-14597711
]
Hudson commented on HDFS-4366:
------------------------------
FAILURE: Integrated in Hadoop-Hdfs-trunk #2165 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2165/])
Move HDFS-4366 to 2.8.0 in CHANGES.txt (wang: rev
5590e914f5889413da9eda047f64842c4b67fe85)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
> Block Replication Policy Implementation May Skip Higher-Priority Blocks for
> Lower-Priority Blocks
> -------------------------------------------------------------------------------------------------
>
> Key: HDFS-4366
> URL: https://issues.apache.org/jira/browse/HDFS-4366
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Derek Dagit
> Assignee: Derek Dagit
> Fix For: 2.8.0
>
> Attachments: HDFS-4366-branch-2.patch, HDFS-4366.patch,
> HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch,
> HDFS-4366.patch, hdfs-4366-unittest.patch
>
>
> In certain cases, higher-priority under-replicated blocks can be skipped by
> the replication policy implementation. The current implementation maintains,
> for each priority level, an index into a list of blocks that are
> under-replicated. Together, the lists compose a priority queue (see note
> later about branch-0.23). In some cases when blocks are removed from a list,
> the caller (BlockManager) properly handles the index into the list from which
> it removed a block. In some other cases, the index remains stationary while
> the list changes. Whenever this happens, and the removed block happened to
> be at or before the index, the implementation will skip over a block when
> selecting blocks for replication work.
> In situations when entire racks are decommissioned, leading to many
> under-replicated blocks, loss of blocks can occur.
> Background: HDFS-1765
> This patch to trunk greatly improved the state of the replication policy
> implementation. Prior to the patch, the following details were true:
> * The block "priority queue" was no such thing: It was really set of
> trees that held blocks in natural ordering, that being by the blocks ID,
> which resulted in iterator walks over the blocks in pseudo-random order.
> * There was only a single index into an iteration over all of the
> blocks...
> * ... meaning the implementation was only successful in respecting
> priority levels on the first pass. Overall, the behavior was a
> round-robin-type scheduling of blocks.
> After the patch
> * A proper priority queue is implemented, preserving log n operations
> while iterating over blocks in the order added.
> * A separate index for each priority is key is kept...
> * ... allowing for processing of the highest priority blocks first
> regardless of which priority had last been processed.
> The change was suggested for branch-0.23 as well as trunk, but it does not
> appear to have been pulled in.
> The problem:
> Although the indices are now tracked in a better way, there is a
> synchronization issue since the indices are managed outside of methods to
> modify the contents of the queue.
> Removal of a block from a priority level without adjusting the index can mean
> that the index then points to the block after the block it originally pointed
> to. In the next round of scheduling for that priority level, the block
> originally pointed to by the index is skipped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)