[
https://issues.apache.org/jira/browse/HDFS-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613412#comment-15613412
]
Wei-Chiu Chuang commented on HDFS-11070:
----------------------------------------
This bug was reproduced using the test in HDFS-11056.
But it is reproducible every time, due to its race condition nature.
> NPE in BlockSender due to race condition
> ----------------------------------------
>
> Key: HDFS-11070
> URL: https://issues.apache.org/jira/browse/HDFS-11070
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Reporter: Wei-Chiu Chuang
>
> Saw the following NPE in a unit test:
> {quote}
> 2016-10-27 14:42:58,450 ERROR DataNode - 127.0.0.1:51987:DataXceiver error
> processing READ_BLOCK operation src: /127.0.0.1:52429 dst: /127.0.0.1:51987
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:284)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> The NPE occurred here:
> {code:title=BlockSender.<init>}
> // Obtain a reference before reading data
> this.volumeRef = datanode.data.getVolume(block).obtainReference();
> {code}
> Right before the NPE was a few lines of debug message that indicated the
> replica was appended and updated.
> {quote}
> 2016-10-27 14:42:58,442 DEBUG DataNode -
> block=BP-1071315328-172.16.1.88-1477604513635:blk_1073741825_1192,
> replica=FinalizedReplica, blk_1073741825_1192, FINALIZED
> getNumBytes() = 192
> getBytesOnDisk() = 192
> getVisibleLength()= 192
> getVolume() =
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
> getBlockURI() =
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-27 14:42:58,442 INFO FsDatasetImpl - Appending to FinalizedReplica,
> blk_1073741825_1192, FINALIZED
> getNumBytes() = 192
> getBytesOnDisk() = 192
> getVisibleLength()= 192
> getVolume() =
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
> getBlockURI() =
> file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/finalized/subdir0/subdir0/blk_1073741825
> 2016-10-27 14:42:58,442 DEBUG FsDatasetCache - Block with id 1073741825, pool
> BP-1071315328-172.16.1.88-1477604513635 does not need to be uncached, because
> it is not currently in the mappableBlockMap.
> 2016-10-27 14:42:58,450 DEBUG LocalReplica - Renaming
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/finalized/subdir0/subdir0/blk_1073741825_1192.meta
> to
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/rbw/blk_1073741825_1193.meta
> 2016-10-27 14:42:58,450 DEBUG LocalReplica - Renaming
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/finalized/subdir0/subdir0/blk_1073741825
> to
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/rbw/blk_1073741825,
> file length=192
> 2016-10-27 14:42:58,450 DEBUG DataNode - writeTo blockfile is
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/rbw/blk_1073741825
> of size 192
> 2016-10-27 14:42:58,450 DEBUG DataNode - writeTo metafile is
> /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-1071315328-172.16.1.88-1477604513635/current/rbw/blk_1073741825_1193.meta
> of size 11
> {quote}
> The block object's genstamp should have been the same as the ondisk replica.
> However, the log suggests the replica's genstamp may have been updated after:
> {code:title=BlockSender.<init>}
> try(AutoCloseableLock lock = datanode.data.acquireDatasetLock()) {
> replica = getReplica(block, datanode);
> replicaVisibleLength = replica.getVisibleLength();
> }
> // if there is a write in progress
> ChunkChecksum chunkChecksum = null;
> if (replica.getState() == ReplicaState.RBW) {
> final ReplicaInPipeline rbw = (ReplicaInPipeline) replica;
> waitForMinLength(rbw, startOffset + length);
> chunkChecksum = rbw.getLastChecksumAndDataLen();
> }
> {code}
> In summary, I think the assumption here is not valid, because a
> write-in-progress may happen after that check.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]