Ted Yu created HBASE-21139:
------------------------------
Summary: Concurrent invocations of
MetricsTableAggregateSourceImpl.getOrCreateTableSource may return unregistered
MetricsTableSource
Key: HBASE-21139
URL: https://issues.apache.org/jira/browse/HBASE-21139
Project: HBase
Issue Type: Bug
Reporter: Ted Yu
>From test output of TestRestoreFlushSnapshotFromClient :
{code}
2018-09-01 21:09:38,174 WARN [member:
'hw13463.attlocal.net,49623,1535861370108' subprocedure-pool6-thread-1]
snapshot.
RegionServerSnapshotManager$SnapshotSubprocedurePool(348): Got Exception in
SnapshotSubprocedurePool
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:324)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.updateFlushTime(MetricsTableSourceImpl.java:375)
at
org.apache.hadoop.hbase.regionserver.MetricsTable.updateFlushTime(MetricsTable.java:56)
at
org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateFlush(MetricsRegionServer.java:210)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2826)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2416)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2306)
at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2209)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:115)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
{code}
In MetricsTableAggregateSourceImpl.getOrCreateTableSource :
{code}
MetricsTableSource prev = tableSources.putIfAbsent(table, source);
if (prev != null) {
return prev;
} else {
// register the new metrics now
register(source);
{code}
Suppose threads t1 and t2 execute the above code concurrently.
t1 calls putIfAbsent first and proceeds to running {{register(source)}}.
Context switches, t2 gets to putIfAbsent and retrieves the instance stored by
t1 which is not registered yet.
We would end up with what the stack trace showed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)