[
https://issues.apache.org/jira/browse/HIVE-10410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575389#comment-14575389
]
Richard Williams commented on HIVE-10410:
-----------------------------------------
[~thejas] In the test case I use to reproduce this issue, I'm not sharing a
session among multiple queries or setting hive.exec.parallel=true. I am,
however, using the same session across multiple calls to the HiveServer2 Thrift
API--in particular, I'm running many concurrent processes that each open a
session, execute an asynchronous operation, poll HiveServer2 for the status of
that operation until it completes, and then close their session. I think that
what's happening is that the foreground thread is continuing to talk to the
metastore using its Hive object while the pooled background thread is executing
the asynchronous operation (and thus also using the same Hive object).
Right, the issue you mentioned is what I was talking about in my earlier
comment--the patch I uploaded relies on the UserGroupInformation.doAs call in
the submitted background operation to ensure that the current user in the
background thread is the same as the current user in the foreground thread.
Thus, when the background thread calls Hive.get, the call to isCurrentUserOwner
will return false if the existing MetaStoreClient is associated with the wrong
user and a new connection, this time associated with the correct user, will be
created. Presumably the code that does this is the safeguard you're referring
to?
{noformat}
public static Hive get(HiveConf c, boolean needsRefresh) throws HiveException
{
Hive db = hiveDB.get();
if (db == null || needsRefresh || !db.isCurrentUserOwner()) {
if (db != null) {
LOG.debug("Creating new db. db = " + db + ", needsRefresh = " +
needsRefresh +
", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
}
closeCurrent();
c.set("fs.scheme.class", "dfs");
Hive newdb = new Hive(c);
hiveDB.set(newdb);
return newdb;
}
{noformat}
We haven't noticed any performance issues attributable to frequent
reconnections to the metastore to as a result of this change. Then again, we
probably wouldn't; we're using Sentry and thus have HiveServer2 impersonation
disabled, so I would expect the current user to always be the user running the
HiveServer2 process. When I get a chance, I'll try changing
Hive.createMetaStoreClient to wrap the client it creates using
HiveMetaStoreClient.newSynchronizedClient instead of removing the code that
sets the background thread's Hive object.
> Apparent race condition in HiveServer2 causing intermittent query failures
> --------------------------------------------------------------------------
>
> Key: HIVE-10410
> URL: https://issues.apache.org/jira/browse/HIVE-10410
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 0.13.1
> Environment: CDH 5.3.3
> CentOS 6.4
> Reporter: Richard Williams
> Attachments: HIVE-10410.1.patch
>
>
> On our secure Hadoop cluster, queries submitted to HiveServer2 through JDBC
> occasionally trigger odd Thrift exceptions with messages such as "Read a
> negative frame size (-2147418110)!" or "out of sequence response" in
> HiveServer2's connections to the metastore. For certain metastore calls (for
> example, showDatabases), these Thrift exceptions are converted to
> MetaExceptions in HiveMetaStoreClient, which prevents RetryingMetaStoreClient
> from retrying these calls and thus causes the failure to bubble out to the
> JDBC client.
> Note that as far as we can tell, this issue appears to only affect queries
> that are submitted with the runAsync flag on TExecuteStatementReq set to true
> (which, in practice, seems to mean all JDBC queries), and it appears to only
> manifest when HiveServer2 is using the new HTTP transport mechanism. When
> both these conditions hold, we are able to fairly reliably reproduce the
> issue by spawning about 100 simple, concurrent hive queries (we have been
> using "show databases"), two or three of which typically fail. However, when
> either of these conditions do not hold, we are no longer able to reproduce
> the issue.
> Some example stack traces from the HiveServer2 logs:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.log: Got exception:
> org.apache.thrift.transport.TTransportException Read a negative frame size
> (-2147418110)!
> org.apache.thrift.transport.TTransportException: Read a negative frame size
> (-2147418110)!
> at
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:435)
> at
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
> at
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
> at
> org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
> at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
> at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
> at
> org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The above exception being converted into a MetaException and re-thrown:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR hive.ql.exec.DDLTask:
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got
> exception: org.apache.thrift.transport.TTransportException Read a negative
> frame size (-2147418110)!)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1141)
> at
> org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: MetaException(message:Got exception:
> org.apache.thrift.transport.TTransportException Read a negative frame size
> (-2147418110)!)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1116)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:839)
> at
> org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
> at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
> at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
> ... 22 more
> {noformat}
> The above MetaException causing the query as a whole to fail:
> {noformat}
> 2015-04-16 13:54:55,486 ERROR
> org.apache.hive.service.cli.operation.Operation: Error running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing
> statement: FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception:
> org.apache.thrift.transport.TTransportException Read a negative frame size
> (-2147418110)!)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:148)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> An "out of sequence response" that occurred shortly after the above exception
> and may have been triggered by it:
> {noformat}
> 2015-04-16 13:54:55,498 ERROR hive.log: Got exception:
> org.apache.thrift.TApplicationException get_databases failed: out of sequence
> response
> org.apache.thrift.TApplicationException: get_databases failed: out of
> sequence response
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
> at
> org.apache.sentry.binding.metastore.SentryHiveMetaStoreClient.getDatabases(SentryHiveMetaStoreClient.java:60)
> at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
> at com.sun.proxy.$Proxy6.getDatabases(Unknown Source)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.getDatabasesByPattern(Hive.java:1139)
> at
> org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2445)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:364)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1554)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1321)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1139)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:962)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:957)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:145)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
> at
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)