Josh Elser created PHOENIX-3081:
-----------------------------------

             Summary: MIsleading exception on async stats update after major 
compaction
                 Key: PHOENIX-3081
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3081
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Josh Elser
            Assignee: Josh Elser
            Priority: Minor
             Fix For: 4.9.0, 4.8.1


Saw an error in some $dayjob testing where, while a RegionServer was going down 
to due to an exception, there was a scary looking exception about being unable 
to write to the stats table because an hconnection was closed. Pardon the 
mis-matched line numbers:

{noformat}
2016-07-17 07:52:13,229 ERROR [phoenix-update-statistics-0] 
stats.StatisticsScanner: Failed to update statistics table!
org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
  at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:309)
  at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:152)
  at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
  at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
  at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
  at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
  at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
  at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
  at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
  at 
org.apache.hadoop.hbase.client.HTableWrapper.getScanner(HTableWrapper.java:215)
  at 
org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:136)
  at 
org.apache.phoenix.schema.stats.StatisticsWriter.deleteStats(StatisticsWriter.java:230)
  at 
org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:117)
  at 
org.apache.phoenix.schema.stats.StatisticsScanner$StatisticsScannerCallable.call(StatisticsScanner.java:102)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: hconnection-0x5314972b closed
  at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1153)
  at 
org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
  at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1133)
  at 
org.apache.hadoop.hbase.client.CoprocessorHConnection.relocateRegion(CoprocessorHConnection.java:41)
  at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1338)
  at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
  at 
org.apache.hadoop.hbase.client.CoprocessorHConnection.locateRegion(CoprocessorHConnection.java:41)
  at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
  ... 17 more
{noformat}

Looking into this some more, this async task to update the stats was still 
running after a RegionServer already was in the process of shutting down. The 
RegionServer already closed all of the "userRegions", but, because this task is 
async, the task is still running and using the RegionServer's 
CoprocessorHConnection. So, the RegionServer thinks all of the user regions are 
closed and it is safe to close the HConnection. In reality, there is still code 
tied to those user regions that might be running (as we can see with the above 
stacktrace). The next time the StatisticsScannerCallable tries to use the 
HConnection, it will then error.

I think the simple fix is to just use the CoprocessorEnvironment to access the 
RegionServerServices and use the {{isClosing()}} and {{isClosed()}} methods. 
This is all pretty minor because the RegionServer is already shutting down, but 
it is likely misleading to less-experienced users who would think that the last 
exception in the log is the problem.

Will put up a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to