stack created HBASE-14533:
-----------------------------
Summary: Thrift client gets "AsyncProcess: Failed to get region
location .... closed"
Key: HBASE-14533
URL: https://issues.apache.org/jira/browse/HBASE-14533
Project: HBase
Issue Type: Bug
Components: REST, Thrift
Affects Versions: 1.0.0
Reporter: stack
An internal python client has been getting below stack trace since HBASE-134347
{code}
2015-09-30 11:27:31,670 runner ERROR : scheduler executor
error
2015-09-30 11:27:31,674 runner ERROR : Traceback (most
recent call last):
File
"/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsRtiFetcher-0.1-py2.6.egg/cops_rti/fetcher/runner.py",
line 82, in run
fetch_list = self.__scheduler_executor.run()
File
"/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsRtiFetcher-0.1-py2.6.egg/cops_rti/fetcher/scheduler.py",
line 35, in run
with self.__fetch_db_dao.get_scanner() as scanner:
File
"/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_dao.py",
line 57, in get_scanner
caching=caching, field_filter_list=field_filter_list)
File
"/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_client_template.py",
line 104, in get_entity_scanner
self.__fix_cfs(self.__filter_columns(field_filter_list)), caching)
File
"/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_entity_scanner.py",
line 81, in open
self.__scanner_id = client.scannerOpenWithScan(table_name, scan)
File
"/opt/cops/cops-related-ticket-info-fetcher/.crepo/cops-hbase-common/ext-py/hbase/Hbase.py",
line 1494, in scannerOpenWithScan
return self.recv_scannerOpenWithScan()
File
"/opt/cops/cops-related-ticket-info-fetcher/.crepo/cops-hbase-common/ext-py/hbase/Hbase.py",
line 1518, in recv_scannerOpenWithScan
raise result.io
IOError:
IOError(message="org.apache.hadoop.hbase.client.RetriesExhaustedException:
Can't get the location\n\tat
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308)\n\tat
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149)\n\tat
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57)\n\tat
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)\n\tat
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293)\n\tat
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268)\n\tat
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)\n\tat
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135)\n\tat
org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888)\n\tat
org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.scannerOpenWithScan(ThriftServerRunner.java:1446)\n\tat
sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)\n\tat
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
java.lang.reflect.Method.invoke(Method.java:606)\n\tat
org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)\n\tat
com.sun.proxy.$Proxy14.scannerOpenWithScan(Unknown Source)\n\tat
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4609)\n\tat
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4593)\n\tat
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat
org.apache.hadoop.hbase.thrift.ThriftServerRunner$3.process(ThriftServerRunner.java:502)\n\tat
org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat
java.lang.Thread.run(Thread.java:745)\nCaused by: java.io.IOException:
hconnection-0xa8e1bf9 closed\n\tat
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1117)\n\tat
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299)\n\t...
23 more\n")
{code}
On the thrift server side we see this:
{code}
2015-09-30 07:22:59,427 ERROR org.apache.hadoop.hbase.client.AsyncProcess:
Failed to get region location
java.io.IOException: hconnection-0x4142991e closed
at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1117)
at
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
at
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1496)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1107)
at
org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRowTs(ThriftServerRunner.java:1256)
at
org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRow(ThriftServerRunner.java:1209)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)
at com.sun.proxy.$Proxy14.mutateRow(Unknown Source)
at
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:4334)
at
org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:4318)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.hadoop.hbase.thrift.ThriftServerRunner$3.process(ThriftServerRunner.java:502)
at
org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
HBASE-13437 has us actual execute a close on timeout -- before we'd mark
connection closed but would never call close on it.
A background chore is going around stamping Connections in the ConnectionCache
as 'closed' if they have not been used in ten minutes. The 'close' can come in
at any time..... In particular between the point at which we get the
table/connection and when we go to use it: i.e. flush puts. It is at the flush
puts point that we get the above 'AsyncProcess: Failed to get region location'
(It is not a failure to find region location but rather our noticing that the
connection has been closed).
Attempts at reproducing this issue locally letting the Connection timeout can
generate the above exception if a certain dance is done but it is hard to do; I
am not reproducing the actual usage by the aforementioned client.
Next steps would be setting up python client talking via thrift and then try
using connection after it has been evicted from the connection cache. Another
thing to try is a pool of connections on the python side...connections are
identified by user and table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)