Josh Elser created PHOENIX-3081:
-----------------------------------
Summary: MIsleading exception on async stats update after major
compaction
Key: PHOENIX-3081
URL: https://issues.apache.org/jira/browse/PHOENIX-3081
Project: Phoenix
Issue Type: Improvement
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Minor
Fix For: 4.9.0, 4.8.1
Saw an error in some $dayjob testing where, while a RegionServer was going down
to due to an exception, there was a scary looking exception about being unable
to write to the stats table because an hconnection was closed. Pardon the
mis-matched line numbers:
{noformat}
2016-07-17 07:52:13,229 ERROR [phoenix-update-statistics-0]
stats.StatisticsScanner: Failed to update statistics table!
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
at
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
at
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
at
org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215)
at
org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136)
at
org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230)
at
org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117)
at
org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: hconnection-0x5314972b closed
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153)
at
org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133)
at
org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41)
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338)
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
at
org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
... 17 more
{noformat}
Looking into this some more, this async task to update the stats was still
running after a RegionServer already was in the process of shutting down. The
RegionServer already closed all of the "userRegions", but, because this task is
async, the task is still running and using the RegionServer's
CoprocessorHConnection. So, the RegionServer thinks all of the user regions are
closed and it is safe to close the HConnection. In reality, there is still code
tied to those user regions that might be running (as we can see with the above
stacktrace). The next time the StatisticsScannerCallable tries to use the
HConnection, it will then error.
I think the simple fix is to just use the CoprocessorEnvironment to access the
RegionServerServices and use the {{isClosing()}} and {{isClosed()}} methods.
This is all pretty minor because the RegionServer is already shutting down, but
it is likely misleading to less-experienced users who would think that the last
exception in the log is the problem.
Will put up a patch shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)