[ 
https://issues.apache.org/jira/browse/HDFS-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458587#comment-15458587
 ] 

Wei-Chiu Chuang commented on HDFS-9781:
---------------------------------------

Hi [~manojg] and thanks everyone here for comments. When I opened this issue I 
did not understand the code inside FsDatasetImpl well enough, but now that I 
understand the code better, the bug itself does look like potentially 
impactful, especially during a hotswap operation, I suppose?

[~manojg] I took a quick look at the patch and which looks good to me in 
general. Can you also print the block id when printing the warning log in 
addition to the volume? This will help debugging for supporters if something 
similar happens.
{code}
LOG.warn("Replica volume: " + b.getVolume().getStorageID() + " " +
              "missing. Probably being removed!");
{code}

Also here in the test
{code}
// Slow down while we're holding the reference to the volume.
          // As we finalize a block, the volume is removed in parallel.
          // Ignore any interrupts coming out of volume shutdown.
          try {
            Thread.sleep(1000);
          } catch (Throwable t) {
            // ignore
          }
{code}
Please catch InterruptedException here instead of Throwable. Not a good idea to 
catch a Throwable and ignore it.

Thanks!

> FsDatasetImpl#getBlockReports can occasionally throw NullPointerException
> -------------------------------------------------------------------------
>
>                 Key: HDFS-9781
>                 URL: https://issues.apache.org/jira/browse/HDFS-9781
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0-alpha1
>         Environment: Jenkins
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-9781.002.patch, HDFS-9781.01.patch
>
>
> FsDatasetImpl#getBlockReports occasionally throws NPE. The NPE caused 
> TestFsDatasetImpl#testRemoveVolumeBeingWritten to time out, because the test 
> waits for the call to FsDatasetImpl#getBlockReports to complete without 
> exceptions.
> Additionally, the test should be updated to identify an expected exception, 
> using {{GenericTestUtils.assertExceptionContains()}}
> {noformat}
> Exception in thread "Thread-20" java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1709)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl$1BlockReportThread.run(TestFsDatasetImpl.java:587)
> 2016-02-08 15:47:30,379 [Thread-21] WARN  impl.TestFsDatasetImpl 
> (TestFsDatasetImpl.java:run(606)) - Exception caught. This should not affect 
> the test
> java.io.IOException: Failed to move meta file for ReplicaBeingWritten, 
> blk_0_0, RBW
>   getNumBytes()     = 0
>   getBytesOnDisk()  = 0
>   getVisibleLength()= 0
>   getVolume()       = 
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current
>   getBlockFile()    = 
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0
>   bytesAcked=0
>   bytesOnDisk=0 from 
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0_0.meta
>  to 
> /home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/finalized/subdir0/subdir0/blk_0_0.meta
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:857)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addFinalizedBlock(BlockPoolSlice.java:295)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addFinalizedBlock(FsVolumeImpl.java:819)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1620)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1601)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl$1ResponderThread.run(TestFsDatasetImpl.java:603)
> Caused by: java.io.IOException: 
> renameTo(src=/home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/rbw/blk_0_0.meta,
>  
> dst=/home/weichiu/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/Nmi6rYndvr/data0/current/bpid-0/current/finalized/subdir0/subdir0/blk_0_0.meta)
>  failed.
>         at org.apache.hadoop.io.nativeio.NativeIO.renameTo(NativeIO.java:873)
>         at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.moveBlockFiles(FsDatasetImpl.java:855)
>         ... 5 more
> 2016-02-08 15:47:34,381 [Thread-19] INFO  impl.FsDatasetImpl 
> (FsVolumeList.java:waitVolumeRemoved(287)) - Volume reference is released.
> 2016-02-08 15:47:34,384 [Thread-19] INFO  impl.TestFsDatasetImpl 
> (TestFsDatasetImpl.java:testRemoveVolumeBeingWritten(622)) - Volumes removed
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to