[
https://issues.apache.org/jira/browse/HDFS-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiao Chen updated HDFS-9781:
----------------------------
Attachment: HDFS-9781.01.patch
Thanks for creating this [~jojochuang].
The 'A failed test' link is no longer valid. :( But I've managed to reproduce
this in about 1% frequency. I think there're 2 problems here:
# Test timeout without obvious information (so that people have to see the code
to know why it timed out).
# NPE
Patch 1 is attached to address them:
# Adds more information to the test, and also waits for BR to be received
before releasing the reference. BTW, the test doesn't care whether a exception
is thrown, so IMO no {{assertExceptionContains}} is needed.
# This is from the changes in HDFS-9701: during {{wait}}, the thread is put on
hold and other thread may proceed (getBlockReport, in this case). Since the
{{volumeMap}} is not yet cleared it's possible for the BR thread to get a null
volume. Eddy and I discussed this in HDFS-9701, but at that time I was using
{{Thread.sleep}} which holds the lock. A later findbugs warning made me to
switch to {{wait}} (which IMO is the right thing to do), but then the sequence
should've be modified to make sure internal states such as {{volumeMap}} is
safe.
[~eddyxu], could you please review? Thanks.
> FsDatasetImpl#getBlockReports can occasionally throw NullPointerException
> -------------------------------------------------------------------------
>
> Key: HDFS-9781
> URL: https://issues.apache.org/jira/browse/HDFS-9781
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0
> Environment: Jenkins
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9781.01.patch
>
>
> FsDatasetImpl#getBlockReports occasionally throws NPE. The NPE caused
> TestFsDatasetImpl#testRemoveVolumeBeingWritten to time out, because the test
> waits for the call to FsDatasetImpl#getBlockReports to complete without
> exceptions.
> Additionally, the test should be updated to identify an expected exception,
> using {{GenericTestUtils.assertExceptionContains()}}
> {noformat}
> Exception in thread "Thread-20" java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1709)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl$1BlockReportThread.run(TestFsDatasetImpl.java:587)
> 2016-02-08 15:47:30,379 [Thread-21] WARN impl.TestFsDatasetImpl
> (TestFsDatasetImpl.java:run(606)) - Exception caught. This should not affect
> the test
> java.io.IOException: Failed to move meta file for ReplicaBeingWritten,
> blk_0_0, RBW
> getNumBytes() = 0
> getBytesOnDisk() = 0
> getVisibleLength()= 0
> getVolume() =
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current
> getBlockFile() =
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0
> bytesAcked=0
> bytesOnDisk=0 from
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0_0.meta
> to
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/finalized/subdir0/subdir0/blk_0_0.meta
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:857)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addFinalizedBlock(BlockPoolSlice.java:295)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addFinalizedBlock(FsVolumeImpl.java:819)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1620)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1601)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl$1ResponderThread.run(TestFsDatasetImpl.java:603)
> Caused by: java.io.IOException:
> renameTo(src=/home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0_0.meta,
>
> dst=/home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/finalized/subdir0/subdir0/blk_0_0.meta)
> failed.
> at org.apache.hadoop.io.nativeio.NativeIO.renameTo(NativeIO.java:873)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:855)
> ... 5 more
> 2016-02-08 15:47:34,381 [Thread-19] INFO impl.FsDatasetImpl
> (FsVolumeList.java:waitVolumeRemoved(287)) - Volume reference is released.
> 2016-02-08 15:47:34,384 [Thread-19] INFO impl.TestFsDatasetImpl
> (TestFsDatasetImpl.java:testRemoveVolumeBeingWritten(622)) - Volumes removed
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)