[ 
https://issues.apache.org/jira/browse/HBASE-20475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463322#comment-16463322
 ] 

Zheng Hu edited comment on HBASE-20475 at 5/4/18 4:03 AM:
----------------------------------------------------------

Oh,  the hadoop QA results 
(https://builds.apache.org/job/HBASE-Flaky-Tests/30342/) has been expired , and 
the flaky results in http://104.198.223.121:8080/job/HBASE-Flaky-Tests/
 was caused by another NPE ..

{code}
2018-05-03 21:05:58,075 ERROR [RS_CLOSE_REGION-regionserver/instance-2:0-1] 
helpers.MarkerIgnoringBase(159): ***** ABORTING region server 
instance-2.c.gcp-hbase.internal,42063,1525381545380: Unrecoverable exception 
while closing region tes
t,,1525381436038.66de217a470764f3b37d8faebfd8e8c8., still finishing close *****
java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1637)
        at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1466)
        at 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportFileArchivalForQuotas(HRegionServer.java:3709)
        at 
org.apache.hadoop.hbase.regionserver.HStore.reportArchivedFilesForQuota(HStore.java:2718)
        at 
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2649)
        at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:929)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1615)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1612)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
{code}

In fact, all the ut were passed, but shutdown the rs failed .... 

{code}
$ cat 
2.org.apache.hadoop.hbase.replication.TestReplicationDroppedTables-output.txt  
| grep 'hbase.ResourceChecker'  | grep 
'replication.TestReplicationDroppedTables#'
2018-05-03 21:04:09,348 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: 
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTableNS 
Thread=729, OpenFileDescriptor=1368, MaxFileDescriptor=4096, 
SystemLoadAverage=682, ProcessCount=114, AvailableMemoryMB=2170
2018-05-03 21:04:30,331 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: 
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTableNS 
Thread=812 (was 729)
2018-05-03 21:04:30,373 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: 
replication.TestReplicationDroppedTables#testEditsStuckBehindDroppedTable 
Thread=812, OpenFileDescriptor=1419, MaxFileDescriptor=4096, 
SystemLoadAverage=606, ProcessCount=114, AvailableMemoryMB=1347
2018-05-03 21:05:19,468 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: 
replication.TestReplicationDroppedTables#testEditsStuckBehindDroppedTable 
Thread=797 (was 812), OpenFileDescriptor=1406 (was 1419), 
MaxFileDescriptor=4096 (was 4096), SystemLoadAverage=808 (was 606) - 
SystemLoadAverage LEAK? -, ProcessCount=114 (was 114), AvailableMemoryMB=1001 
(was 1347)
2018-05-03 21:05:19,531 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: 
replication.TestReplicationDroppedTables#testEditsBehindDroppedTableTiming 
Thread=795, OpenFileDescriptor=1406, MaxFileDescriptor=4096, 
SystemLoadAverage=808, ProcessCount=114, AvailableMemoryMB=999
2018-05-03 21:05:37,575 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: 
replication.TestReplicationDroppedTables#testEditsBehindDroppedTableTiming 
Thread=849 (was 795) - Thread LEAK? -, OpenFileDescriptor=1476 (was 1406) - 
OpenFileDescriptor LEAK? -, MaxFileDescriptor=4096 (was 4096), 
SystemLoadAverage=712 (was 808), ProcessCount=114 (was 114), 
AvailableMemoryMB=718 (was 999)
2018-05-03 21:05:37,614 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: 
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTable 
Thread=849, OpenFileDescriptor=1476, MaxFileDescriptor=4096, 
SystemLoadAverage=712, ProcessCount=114, AvailableMemoryMB=708
2018-05-03 21:05:56,883 INFO  [Time-limited test] hbase.ResourceChecker(172): 
after: 
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTable 
Thread=889 (was 849) - Thread LEAK? -, OpenFileDescriptor=1536 (was 1476) - 
OpenFileDescriptor LEAK? -, MaxFileDescriptor=4096 (was 4096), 
SystemLoadAverage=575 (was 712), ProcessCount=112 (was 114), 
AvailableMemoryMB=1413 (was 708) - AvailableMemoryMB LEAK? -
{code}




was (Author: openinx):
Oh,  the hadoop QA results 
(https://builds.apache.org/job/HBASE-Flaky-Tests/30342/) has been expired , and 
the flaky results in http://104.198.223.121:8080/job/HBASE-Flaky-Tests/
 was caused by another NPE ..

{code}
2018-05-03 17:05:59,008 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=58476] 
master.MasterRpcServices(508): Region server 
instance-2.c.gcp-hbase.internal,52125,1525367143898 reported a fatal error:
***** ABORTING region server 
instance-2.c.gcp-hbase.internal,52125,1525367143898: Unrecoverable exception 
while closing region hbase:meta,,1.1588230740, still finishing close *****
Cause:
java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1637)
        at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1466)
        at 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportFileArchivalForQuotas(HRegionServer.java:3709)
        at 
org.apache.hadoop.hbase.regionserver.HStore.reportArchivedFilesForQuota(HStore.java:2718)
        at 
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2649)
        at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:929)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1615)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1612)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
{code}

> Fix the flaky TestReplicationDroppedTables unit test.
> -----------------------------------------------------
>
>                 Key: HBASE-20475
>                 URL: https://issues.apache.org/jira/browse/HBASE-20475
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.1.0
>
>         Attachments: HBASE-20475-addendum-v2.patch, 
> HBASE-20475-addendum-v3.patch, HBASE-20475-addendum.patch, HBASE-20475.patch
>
>
> See 
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to