[
https://issues.apache.org/jira/browse/HBASE-20475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463322#comment-16463322
]
Zheng Hu edited comment on HBASE-20475 at 5/4/18 4:03 AM:
----------------------------------------------------------
Oh, the hadoop QA results
(https://builds.apache.org/job/HBASE-Flaky-Tests/30342/) has been expired , and
the flaky results in http://104.198.223.121:8080/job/HBASE-Flaky-Tests/
was caused by another NPE ..
{code}
2018-05-03 21:05:58,075 ERROR [RS_CLOSE_REGION-regionserver/instance-2:0-1]
helpers.MarkerIgnoringBase(159): ***** ABORTING region server
instance-2.c.gcp-hbase.internal,42063,1525381545380: Unrecoverable exception
while closing region tes
t,,1525381436038.66de217a470764f3b37d8faebfd8e8c8., still finishing close *****
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1637)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1466)
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegionServer.reportFileArchivalForQuotas(HRegionServer.java:3709)
at
org.apache.hadoop.hbase.regionserver.HStore.reportArchivedFilesForQuota(HStore.java:2718)
at
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2649)
at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:929)
at
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1615)
at
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1612)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
{code}
In fact, all the ut were passed, but shutdown the rs failed ....
{code}
$ cat
2.org.apache.hadoop.hbase.replication.TestReplicationDroppedTables-output.txt
| grep 'hbase.ResourceChecker' | grep
'replication.TestReplicationDroppedTables#'
2018-05-03 21:04:09,348 INFO [Time-limited test] hbase.ResourceChecker(148):
before:
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTableNS
Thread=729, OpenFileDescriptor=1368, MaxFileDescriptor=4096,
SystemLoadAverage=682, ProcessCount=114, AvailableMemoryMB=2170
2018-05-03 21:04:30,331 INFO [Time-limited test] hbase.ResourceChecker(172):
after:
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTableNS
Thread=812 (was 729)
2018-05-03 21:04:30,373 INFO [Time-limited test] hbase.ResourceChecker(148):
before:
replication.TestReplicationDroppedTables#testEditsStuckBehindDroppedTable
Thread=812, OpenFileDescriptor=1419, MaxFileDescriptor=4096,
SystemLoadAverage=606, ProcessCount=114, AvailableMemoryMB=1347
2018-05-03 21:05:19,468 INFO [Time-limited test] hbase.ResourceChecker(172):
after:
replication.TestReplicationDroppedTables#testEditsStuckBehindDroppedTable
Thread=797 (was 812), OpenFileDescriptor=1406 (was 1419),
MaxFileDescriptor=4096 (was 4096), SystemLoadAverage=808 (was 606) -
SystemLoadAverage LEAK? -, ProcessCount=114 (was 114), AvailableMemoryMB=1001
(was 1347)
2018-05-03 21:05:19,531 INFO [Time-limited test] hbase.ResourceChecker(148):
before:
replication.TestReplicationDroppedTables#testEditsBehindDroppedTableTiming
Thread=795, OpenFileDescriptor=1406, MaxFileDescriptor=4096,
SystemLoadAverage=808, ProcessCount=114, AvailableMemoryMB=999
2018-05-03 21:05:37,575 INFO [Time-limited test] hbase.ResourceChecker(172):
after:
replication.TestReplicationDroppedTables#testEditsBehindDroppedTableTiming
Thread=849 (was 795) - Thread LEAK? -, OpenFileDescriptor=1476 (was 1406) -
OpenFileDescriptor LEAK? -, MaxFileDescriptor=4096 (was 4096),
SystemLoadAverage=712 (was 808), ProcessCount=114 (was 114),
AvailableMemoryMB=718 (was 999)
2018-05-03 21:05:37,614 INFO [Time-limited test] hbase.ResourceChecker(148):
before:
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTable
Thread=849, OpenFileDescriptor=1476, MaxFileDescriptor=4096,
SystemLoadAverage=712, ProcessCount=114, AvailableMemoryMB=708
2018-05-03 21:05:56,883 INFO [Time-limited test] hbase.ResourceChecker(172):
after:
replication.TestReplicationDroppedTables#testEditsDroppedWithDroppedTable
Thread=889 (was 849) - Thread LEAK? -, OpenFileDescriptor=1536 (was 1476) -
OpenFileDescriptor LEAK? -, MaxFileDescriptor=4096 (was 4096),
SystemLoadAverage=575 (was 712), ProcessCount=112 (was 114),
AvailableMemoryMB=1413 (was 708) - AvailableMemoryMB LEAK? -
{code}
was (Author: openinx):
Oh, the hadoop QA results
(https://builds.apache.org/job/HBASE-Flaky-Tests/30342/) has been expired , and
the flaky results in http://104.198.223.121:8080/job/HBASE-Flaky-Tests/
was caused by another NPE ..
{code}
2018-05-03 17:05:59,008 ERROR
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=58476]
master.MasterRpcServices(508): Region server
instance-2.c.gcp-hbase.internal,52125,1525367143898 reported a fatal error:
***** ABORTING region server
instance-2.c.gcp-hbase.internal,52125,1525367143898: Unrecoverable exception
while closing region hbase:meta,,1.1588230740, still finishing close *****
Cause:
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1637)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1466)
at
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegionServer.reportFileArchivalForQuotas(HRegionServer.java:3709)
at
org.apache.hadoop.hbase.regionserver.HStore.reportArchivedFilesForQuota(HStore.java:2718)
at
org.apache.hadoop.hbase.regionserver.HStore.removeCompactedfiles(HStore.java:2649)
at org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:929)
at
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1615)
at
org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1612)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
{code}
> Fix the flaky TestReplicationDroppedTables unit test.
> -----------------------------------------------------
>
> Key: HBASE-20475
> URL: https://issues.apache.org/jira/browse/HBASE-20475
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Fix For: 3.0.0, 2.1.0
>
> Attachments: HBASE-20475-addendum-v2.patch,
> HBASE-20475-addendum-v3.patch, HBASE-20475-addendum.patch, HBASE-20475.patch
>
>
> See
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)