[
https://issues.apache.org/jira/browse/HDFS-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852131#comment-13852131
]
Buddy commented on HDFS-5483:
-----------------------------
I want to clarify. I had the error case wrong in the above comment.
I think that we are hitting this when the data node is reporting multiple
replicas of the same block.
It looks like we have two replicas of the block on two different storages. The
data node has both storages mounted - one storage is mounted read-write and the
other storage is mounted read-only.
It reports the first replica on the first storage as read-write and the second
replica on the second storage as read-only.
> Make reportDiff resilient to malformed block reports
> ----------------------------------------------------
>
> Key: HDFS-5483
> URL: https://issues.apache.org/jira/browse/HDFS-5483
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: Heterogeneous Storage (HDFS-2832)
> Reporter: Arpit Agarwal
>
> {{BlockManager#reportDiff}} can cause an assertion failure in
> {{BlockInfo#moveBlockToHead}} if the block report shows the same block as
> belonging to more than one storage.
> The issue is that {{moveBlockToHead}} assumes it will find the
> DatanodeStorageInfo for the given block.
> Exception details:
> {code}
> java.lang.AssertionError: Index is out of bound
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.setNext(BlockInfo.java:152)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:351)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:243)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1841)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1709)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1637)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:984)
> at
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:165)
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)