[
https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003524#comment-16003524
]
Paul Rogers commented on DRILL-5496:
------------------------------------
Full stack trace at failure:
{code}
2017-05-01 16:03:00,232 [26f86b8b-c25f-4593-99b6-03f1d927aeee:foreman] WARN
o.a.d.e.s.h.DrillHiveMetaStoreClient - Failure while attempting to get hive
databases. Retries once.
org.apache.hadoop.hive.metastore.api.MetaException: Got exception:
org.apache.thrift.transport.TTransportException null
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1213)
~[hive-metastore-1.2.0-mapr-1608.jar:1.2.0-mapr-1608]
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1033)
~[hive-metastore-1.2.0-mapr-1608.jar:1.2.0-mapr-1608]
at
org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient.getDatabasesHelper(DrillHiveMetaStoreClient.java:203)
~[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$DatabaseLoader.load(DrillHiveMetaStoreClient.java:505)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$DatabaseLoader.load(DrillHiveMetaStoreClient.java:498)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
[guava-18.0.jar:na]
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
[guava-18.0.jar:na]
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
[guava-18.0.jar:na]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
[guava-18.0.jar:na]
at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
[guava-18.0.jar:na]
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
[guava-18.0.jar:na]
at
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
[guava-18.0.jar:na]
at
org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithAuthzWithCaching.getDatabases(DrillHiveMetaStoreClient.java:411)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSubSchema(HiveSchemaFactory.java:139)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.<init>(HiveSchemaFactory.java:133)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.hive.schema.HiveSchemaFactory.registerSchemas(HiveSchemaFactory.java:118)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.hive.HiveStoragePlugin.registerSchemas(HiveStoragePlugin.java:100)
[drill-storage-hive-core-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.StoragePluginRegistryImpl$DrillSchemaFactory.registerSchemas(StoragePluginRegistryImpl.java:396)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.SchemaTreeProvider.createRootSchema(SchemaTreeProvider.java:110)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.store.SchemaTreeProvider.createRootSchema(SchemaTreeProvider.java:99)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:163)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.ops.QueryContext.getRootSchema(QueryContext.java:152)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema(QueryContext.java:138)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.planner.sql.SqlConverter.<init>(SqlConverter.java:110)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:101)
[drill-java-exec-1.10.0.jar:1.10.0]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
[drill-java-exec-1.10.0.jar:1.10.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050)
[drill-java-exec-1.10.0.jar:1.10.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)
[drill-java-exec-1.10.0.jar:1.10.0]
{code}
As it turns out, the code already attempts to retry the connection (added by
DRILL-4964):
{code}
protected static List<String> getDatabasesHelper(final IMetaStoreClient
mClient) throws TException {
try {
return mClient.getAllDatabases();
} catch (MetaException e) {
/*
HiveMetaStoreClient is encapsulating both the
MetaException/TExceptions inside MetaException.
Since we don't have good way to differentiate, we will close older
connection and retry once.
This is only applicable for getAllTables and getAllDatabases method
since other methods are
properly throwing correct exceptions.
*/
logger.warn("Failure while attempting to get hive databases. Retries
once.", e);
try {
mClient.close();
} catch (Exception ex) {
logger.warn("Failure while attempting to close existing hive metastore
connection. May leak connection.", ex);
}
mClient.reconnect();
return mClient.getAllDatabases();
}
}
{code}
The log says:
{code}
WARN o.a.d.e.s.h.DrillHiveMetaStoreClient - Failure while attempting to get
hive databases. Retries once.
{code}
So, we got as far as the line that emits the logger line. That is, we caught
the exception on the invalid connection and we attempted to retry.
But, the log says:
{code}
DrillHiveMetaStoreClient.getDatabasesHelper(DrillHiveMetaStoreClient.java:203)
{code}
[Line
203|https://github.com/apache/drill/blob/b657d44feb527c8e3d83c9996c9220ec4d50aaf3/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java]
is the first call to {{mClient.getAllDatabases()}}. This suggests that the
retry was not actually done.
Consider the code snippet shown earlier. Stepping through the failure scenario
shows that the following line fails:
{code}
mClient.reconnect();
{code}
Evidently this retry code does not work for a secure connection.
> Must restart drillbits whenever a secure Hive metastore is restarted
> --------------------------------------------------------------------
>
> Key: DRILL-5496
> URL: https://issues.apache.org/jira/browse/DRILL-5496
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is
> restarted unless drillbits are restarted also" attempted to fix a bug in
> Drill in which Drill hangs if Hive is restarted. Now, we see that all
> subsequent "show schemas" queries fail.
> Steps to repro:
> 1. Build a secure cluster (we used MapR)
> 2. Install Hive and Drill services
> 3. Configure drill impersonation and authentication
> 4. Restart hivemeta service
> 5. Connect to drill and execute query involving hive storage, issue occurs
> 6. Restart the drill-bits services and execute the query, issue is no longer
> hit
> The problem occurs in the same place as the earlier fix, but might represent
> a slightly different use case: in this case the connection is secure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)