[jira] [Commented] (HBASE-17501) NullPointerException after Datanodes Decommissioned and Terminated

stack (JIRA) Fri, 24 Feb 2017 09:06:04 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883108#comment-15883108
 ]


stack commented on HBASE-17501:
-------------------------------

Thank you for the patch [~lumost]. It looks good.  Is there a utility class 
adjacent that you could move this into....

388         try {
389           // attempt to seek inside of current blockReader
390           istream.seek(seekPoint);
391         } catch (NullPointerException | IOException e) {
392           // if the seek throws a null pointer exception or IOException 
attempt to seek on an alternative copy of the data
393           // this can occur if the blockReader on the DFSInputStream is null
394           istream.seekToNewSource(seekPoint);
395         }

... since it repeats.

I like the reseek when NPE. You think we should reseek on an IOE too?

Thanks boss.

> NullPointerException after Datanodes Decommissioned and Terminated
> ------------------------------------------------------------------
>
>                 Key: HBASE-17501
>                 URL: https://issues.apache.org/jira/browse/HBASE-17501
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>         Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel.  HBase on CDH5.9.0 with some patches.  HDFS CDH 5.9.0 with no patches.
>            Reporter: Patrick Dignan
>            Priority: Minor
>         Attachments: HBASE_17501.patch
>
>
> We recently encountered an interesting NullPointerException in HDFS that 
> bubbles up to HBase, and is resolved be restarting the regionserver.  The 
> issue was exhibited while we were replacing a set of nodes in one of our 
> clusters with a new set.  We did the following:
> 1. Turn off the HBase balancer
> 2. Gracefully move the regions off the nodes we’re shutting off using a tool 
> we wrote to do so
> 3. Decommission the datanodes using the HDFS exclude hosts file and hdfs 
> dfsadmin -refreshNodes
> 4. Wait for the datanodes to decommission fully
> 5. Terminate the VMs the instances are running inside.
> A few notes.  We did not shutdown the datanode processes, and the nodes were 
> therefore not marked as dead by the namenode.  We simply terminated the 
> datanode VM (in this case an AWS instance).  The nodes were marked as 
> decommissioned.  We are running our clusters with DNS, and when we terminate 
> VMs, the associated CName is removed and no longer resolves.  The errors do 
> not seem to resolve without a restart.
> After we did this, the remaining regionservers started throwing 
> NullPointerExceptions with the following stack trace:
> 2017-01-19 23:09:05,638 DEBUG org.apache.hadoop.hbase.ipc.RpcServer: 
> RpcServer.RW.fifo.Q.read.handler=80,queue=14,port=60020: callId: 1727723891 
> service: ClientService methodName: Scan size: 216 connection: 
> 172.16.36.128:31538
> java.io.IOException
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2214)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:204)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183)
> Caused by: java.lang.NullPointerException
>     at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1564)
>     at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1434)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1682)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1542)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:445)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:266)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:642)
>     at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:592)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:294)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:199)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:343)
>     at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:198)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2106)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2096)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5544)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2569)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2555)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2536)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2405)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33738)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
>     ... 3 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17501) NullPointerException after Datanodes Decommissioned and Terminated

Reply via email to