[ 
https://issues.apache.org/jira/browse/HDFS-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641138#comment-13641138
 ] 

Konstantin Shvachko commented on HDFS-4746:
-------------------------------------------

This is the race condition between completeFile() and blockReceivedAndDeleted() 
when completeFile() wins. 
There are two scenarios when one can observe it.
# In HA when StandbyNode sees completeFile txn before learning about the last 
block via blockReceivedAndDeleted().
# DN sends blockReceivedAndDeleted() for the last block, but never gets a reply 
back. At this point DN does not know if the request was applied or not on NN, 
and sends the report of the same replica again. NN was slow. It in fact learned 
about the replica from the first DN attempt, but did not reply to DN in time. 
After that it successfully closed the file and then got the second 
blockReceivedAndDeleted().

This results in the following exception.
{code}
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReceivedAndDeleted 
from 10.191.127.183:57418: error: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.server.namenode.INodeFile cannot be cast to 
org.apache.hadoop.hdfs.server.blockmanagement.MutableBlockCollection
java.lang.ClassCastException: org.apache.hadoop.hdfs.server.namenode.INodeFile 
cannot be cast to 
org.apache.hadoop.hdfs.server.blockmanagement.MutableBlockCollection
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2117)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2605)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2582)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2649)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:922)
        at 
org.apache.hadoop.hdfs.server.namenode.ConsensusNodeRpcServer.blockReceivedAndDeleted(ConsensusNodeRpcServer.java:75)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:180)
        at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:18301)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)
{code}
Looks like it was introduced as a part of BlockManager refactoring HDFS-2106. 
Similar problem was reported for append HDFS-3385, HDFS-3394.
                
> ClassCastException in BlockManager.addStoredBlock() due to that blockReceived 
> came after file was closed.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4746
>                 URL: https://issues.apache.org/jira/browse/HDFS-4746
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.3-alpha
>            Reporter: Konstantin Shvachko
>
> In some cases the last block replica of a file can be reported after the file 
> was closed. In this case file inode is of type INodeFile. 
> BlockManager.addStoredBlock() though expects it to be 
> INodeFileUnderConstruction, and therefore class cast to 
> MutableBlockCollection fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to