[
https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011063#comment-16011063
]
ASF GitHub Bot commented on DRILL-5496:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/833#discussion_r116559073
--- Diff:
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
---
@@ -95,8 +99,63 @@ public HiveScan getPhysicalScan(String userName,
JSONOptions selection, List<Sch
}
}
+ // Forced to synchronize this method to allow error recovery
+ // in the multi-threaded case. Can remove synchronized only
+ // by restructuring connections and cache to allow better
+ // recovery from failed secure connections.
+
@Override
- public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus
parent) throws IOException {
+ public synchronized void registerSchemas(SchemaConfig schemaConfig,
SchemaPlus parent) throws IOException {
+ try {
+ schemaFactory.registerSchemas(schemaConfig, parent);
+ return;
+
+ // Hack. We may need to retry the connection. But, we can't because
+ // the retry logic is implemented in the very connection we need to
+ // discard and rebuild. To work around, we discard the entire schema
+ // factory, and all its invalid connections. Very crude, but the
+ // easiest short-term solution until we refactor the code to do the
+ // job properly. See DRILL-5510.
+
+ } catch (Throwable e) {
+ // Unwrap exception
+ Throwable ex = e;
+ for (;;) {
+ // Case for failing on an invalid cached connection
+ if (ex instanceof MetaException ||
+ // Case for a timed-out impersonated connection, and
+ // an invalid non-secure connection used to get security
+ // tokens.
+ ex instanceof TTransportException) {
+ break;
+ }
+
+ // All other exceptions are not handled, just pass along up
+ // the stack.
+
+ if (ex.getCause() == null || ex.getCause() == ex) {
+ throw new DrillRuntimeException( "Unknown Hive error", e );
+ }
+ ex = ex.getCause();
+ }
+ }
+
+ // Build a new factory which will cause an all new set of
+ // Hive metastore connections to be created.
+
+ try {
+ schemaFactory.close();
+ } catch (Throwable t) {
+ // Ignore, we're in a bad state.
--- End diff --
Fixed.
> Must restart drillbits whenever a secure Hive metastore is restarted
> --------------------------------------------------------------------
>
> Key: DRILL-5496
> URL: https://issues.apache.org/jira/browse/DRILL-5496
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is
> restarted unless drillbits are restarted also" attempted to fix a bug in
> Drill in which Drill hangs if Hive is restarted. Now, we see that all
> subsequent "show schemas" queries fail.
> Steps to repro:
> 1. Build a secure cluster (we used MapR)
> 2. Install Hive and Drill services
> 3. Configure drill impersonation and authentication
> 4. Restart hivemeta service
> 5. Connect to drill and execute query involving hive storage, issue occurs
> 6. Restart the drill-bits services and execute the query, issue is no longer
> hit
> The problem occurs in the same place as the earlier fix, but might represent
> a slightly different use case: in this case the connection is secure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)