[ https://issues.apache.org/jira/browse/HDFS-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458587#comment-15458587 ]
Wei-Chiu Chuang commented on HDFS-9781: --------------------------------------- Hi [~manojg] and thanks everyone here for comments. When I opened this issue I did not understand the code inside FsDatasetImpl well enough, but now that I understand the code better, the bug itself does look like potentially impactful, especially during a hotswap operation, I suppose? [~manojg] I took a quick look at the patch and which looks good to me in general. Can you also print the block id when printing the warning log in addition to the volume? This will help debugging for supporters if something similar happens. {code} LOG.warn("Replica volume: " + b.getVolume().getStorageID() + " " + "missing. Probably being removed!"); {code} Also here in the test {code} // Slow down while we're holding the reference to the volume. // As we finalize a block, the volume is removed in parallel. // Ignore any interrupts coming out of volume shutdown. try { Thread.sleep(1000); } catch (Throwable t) { // ignore } {code} Please catch InterruptedException here instead of Throwable. Not a good idea to catch a Throwable and ignore it. Thanks! > FsDatasetImpl#getBlockReports can occasionally throw NullPointerException > ------------------------------------------------------------------------- > > Key: HDFS-9781 > URL: https://issues.apache.org/jira/browse/HDFS-9781 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 3.0.0-alpha1 > Environment: Jenkins > Reporter: Wei-Chiu Chuang > Assignee: Manoj Govindassamy > Attachments: HDFS-9781.002.patch, HDFS-9781.01.patch > > > FsDatasetImpl#getBlockReports occasionally throws NPE. The NPE caused > TestFsDatasetImpl#testRemoveVolumeBeingWritten to time out, because the test > waits for the call to FsDatasetImpl#getBlockReports to complete without > exceptions. > Additionally, the test should be updated to identify an expected exception, > using {{GenericTestUtils.assertExceptionContains()}} > {noformat} > Exception in thread "Thread-20" java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1709) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl$1BlockReportThread.run(TestFsDatasetImpl.java:587) > 2016-02-08 15:47:30,379 [Thread-21] WARN impl.TestFsDatasetImpl > (TestFsDatasetImpl.java:run(606)) - Exception caught. This should not affect > the test > java.io.IOException: Failed to move meta file for ReplicaBeingWritten, > blk_0_0, RBW > getNumBytes() = 0 > getBytesOnDisk() = 0 > getVisibleLength()= 0 > getVolume() = > /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current > getBlockFile() = > /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0 > bytesAcked=0 > bytesOnDisk=0 from > /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0_0.meta > to > /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/finalized/subdir0/subdir0/blk_0_0.meta > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:857) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addFinalizedBlock(BlockPoolSlice.java:295) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addFinalizedBlock(FsVolumeImpl.java:819) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1620) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1601) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl$1ResponderThread.run(TestFsDatasetImpl.java:603) > Caused by: java.io.IOException: > renameTo(src=/home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0_0.meta, > > dst=/home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/finalized/subdir0/subdir0/blk_0_0.meta) > failed. > at org.apache.hadoop.io.nativeio.NativeIO.renameTo(NativeIO.java:873) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:855) > ... 5 more > 2016-02-08 15:47:34,381 [Thread-19] INFO impl.FsDatasetImpl > (FsVolumeList.java:waitVolumeRemoved(287)) - Volume reference is released. > 2016-02-08 15:47:34,384 [Thread-19] INFO impl.TestFsDatasetImpl > (TestFsDatasetImpl.java:testRemoveVolumeBeingWritten(622)) - Volumes removed > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org