[ 
https://issues.apache.org/jira/browse/HBASE-20350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429127#comment-16429127
 ] 

Appy commented on HBASE-20350:
------------------------------

This bug is breakdown of our very basic invariant - scanner.next() doesn't 
return null.
Basic pattern in scanners is, after each scanner.next(), do a scanner.peek() to 
make sure that next call to next() won't be null.

In this bug, when PQ.poll() is called, it calls peek() on scanners and gets 
null from some scanner, which is bad.
The fact that the poll was done during close() which in turn was started by 
lease expiry doesn't necessarily make the lease expiry culprit. I think it 
*could* have very well happened if the scanner was in exact same state but 
close was called by say, the client.

This will be a very difficult to reproduce without knowing exact scanner 
details, and hence difficult to debug.
The difficulty adds up given not many changes were made to this part recently 
(the once i saw seemed unrelated, like casting, etc)

So i'd say, don't make it a blocker.

But yes, a simple iteration over scanners in KeyValueHeap, instead of poll()s, 
when closing seems a good change to make.

> RegionServer is aborted due to NPE when scanner lease expired on a region
> -------------------------------------------------------------------------
>
>                 Key: HBASE-20350
>                 URL: https://issues.apache.org/jira/browse/HBASE-20350
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0-beta-2
>            Reporter: Umesh Agashe
>            Assignee: Appy
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> From logs:
> {code}
> 2018-04-03 02:06:00,630 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Replaying edits from 
> hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180403004104/834545a2ae1baa47082a3bc7aab2be2f/recovered.edits/0000000000001032167
> 2018-04-03 02:06:00,724 INFO 
> org.apache.hadoop.hbase.regionserver.RSRpcServices: Scanner 
> 2120114333978460945 lease expired on region 
> IntegrationTestBigLinkedList_20180403004104,\xF1\xFE\xCB\x98e1\xF8\xD4,1522742825561.ce0d91585a2d188123173c36d0b693a5.
> 2018-04-03 02:06:00,730 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ***** ABORTING region 
> server vd0510.halxg.cloudera.com,22101,1522626204176: Uncaught exception in 
> executorService thread 
> regionserver/vd0510.halxg.cloudera.com/10.17.226.13:22101.leaseChecker *****
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.CellComparatorImpl.compareRows(CellComparatorImpl.java:202)
>   at 
> org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:74)
>   at 
> org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:61)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:207)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:178)
>   at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:721)
>   at java.util.PriorityQueue.siftDown(PriorityQueue.java:687)
>   at java.util.PriorityQueue.poll(PriorityQueue.java:595)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:228)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:483)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.close(StoreScanner.java:464)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:224)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:6666)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices$ScannerListener.leaseExpired(RSRpcServices.java:460)
>   at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:122)
>   at java.lang.Thread.run(Thread.java:748)
> 2018-04-03 02:06:00,731 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController, 
> org.apache.hadoop.hbase.security.token.TokenProvider, 
> org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint, 
> com.cloudera.navigator.audit.hbase.RegionAuditCoProcessor]
> 2018-04-03 02:06:00,737 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics as JSON 
> on abort: {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to