[jira] [Commented] (HDFS-3874) Exception when client reports bad checksum to NN

Kihwal Lee (JIRA) Mon, 26 Nov 2012 06:33:05 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503801#comment-13503801
 ]


Kihwal Lee commented on HDFS-3874:
----------------------------------

In branch-1, srcNode is created from the result of getRemoteSocketAddress(), so 
it contained peer's address instead of null passed down from client.  If we do 
something equivalent, the last one in the pipeline will be able to report the 
second to the last, but that is not enough to cover all the cases. The last one 
needs to be included when determining up to which offset the data is sound. If 
the last DN simply disappears like today, this won't happen.

Since HDFS-3875 talks about this specific issue, we will continue the 
discussion there.
                
> Exception when client reports bad checksum to NN
> ------------------------------------------------
>
>                 Key: HDFS-3874
>                 URL: https://issues.apache.org/jira/browse/HDFS-3874
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 2.0.0-alpha, 0.23.5
>            Reporter: Todd Lipcon
>            Assignee: Kihwal Lee
>            Priority: Critical
>
> We see the following exception in our logs on a cluster:
> {code}
> 2012-08-27 16:34:30,400 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> NameNode.reportBadBlocks
> 2012-08-27 16:34:30,400 ERROR 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:hdfs (auth:SIMPLE) cause:java.io.IOException: Cannot mark 
> blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
> as corrupt because datanode :0 does not exist
> 2012-08-27 16:34:30,400 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 46 on 8020, call 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.reportBadBlocks from 
> 172.29.97.219:43805: error: java.io.IOException: Cannot mark 
> blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
> as corrupt because datanode :0 does not exist
> java.io.IOException: Cannot mark 
> blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
> as corrupt because datanode :0 does not exist
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1001)
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.findAndMarkBlockAsCorrupt(BlockManager.java:994)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.reportBadBlocks(FSNamesystem.java:4736)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.reportBadBlocks(NameNodeRpcServer.java:537)
>         at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reportBadBlocks(DatanodeProtocolServerSideTranslatorPB.java:242)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20032)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3874) Exception when client reports bad checksum to NN

Reply via email to