[ 
https://issues.apache.org/jira/browse/HDFS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561439#comment-13561439
 ] 

Uma Maheswara Rao G commented on HDFS-4366:
-------------------------------------------

Thanks a lot, for working on this Derek.
{code}
Right, but looks like we're still missing a test for the fix here, iiuc your 
first patch here noted that we were failing to update the priority index in a 
couple of places which should mean we could write a test that would fail w/o 
this change because we weren't respecting priority right?
{code}
I agree with Eli. Here I also wanted to see a test for the case where you are 
seeing that we are missing to update index and but removing the element. All 
the updations to neededReplications out side mostly would be in namesystem 
lock. The chooseUnderReplicatedBlocks also will be processed under namesystem 
lock. That issue says that we missed somewhere to update that index. It would 
be great if you can add some test to cover that scenario. Otherwise we still 
miss to cover that case.
                
> Block Replication Policy Implementation May Skip Higher-Priority Blocks for 
> Lower-Priority Blocks
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4366
>                 URL: https://issues.apache.org/jira/browse/HDFS-4366
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 0.23.5
>            Reporter: Derek Dagit
>            Assignee: Derek Dagit
>         Attachments: HDFS-4366.patch, HDFS-4366.patch, HDFS-4366.patch, 
> HDFS-4366.patch, hdfs-4366-unittest.patch
>
>
> In certain cases, higher-priority under-replicated blocks can be skipped by 
> the replication policy implementation.  The current implementation maintains, 
> for each priority level, an index into a list of blocks that are 
> under-replicated.  Together, the lists compose a priority queue (see note 
> later about branch-0.23).  In some cases when blocks are removed from a list, 
> the caller (BlockManager) properly handles the index into the list from which 
> it removed a block.  In some other cases, the index remains stationary while 
> the list changes.  Whenever this happens, and the removed block happened to 
> be at or before the index, the implementation will skip over a block when 
> selecting blocks for replication work.
> In situations when entire racks are decommissioned, leading to many 
> under-replicated blocks, loss of blocks can occur.
> Background: HDFS-1765
> This patch to trunk greatly improved the state of the replication policy 
> implementation.  Prior to the patch, the following details were true:
>       * The block "priority queue" was no such thing: It was really set of 
> trees that held blocks in natural ordering, that being by the blocks ID, 
> which resulted in iterator walks over the blocks in pseudo-random order.
>       * There was only a single index into an iteration over all of the 
> blocks...
>       * ... meaning the implementation was only successful in respecting 
> priority levels on the first pass.  Overall, the behavior was a 
> round-robin-type scheduling of blocks.
> After the patch
>       * A proper priority queue is implemented, preserving log n operations 
> while iterating over blocks in the order added.
>       * A separate index for each priority is key is kept...
>       * ... allowing for processing of the highest priority blocks first 
> regardless of which priority had last been processed.
> The change was suggested for branch-0.23 as well as trunk, but it does not 
> appear to have been pulled in.
> The problem:
> Although the indices are now tracked in a better way, there is a 
> synchronization issue since the indices are managed outside of methods to 
> modify the contents of the queue.
> Removal of a block from a priority level without adjusting the index can mean 
> that the index then points to the block after the block it originally pointed 
> to.  In the next round of scheduling for that priority level, the block 
> originally pointed to by the index is skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to