[
https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003843#comment-16003843
]
Paul Rogers commented on DRILL-5496:
------------------------------------
The problem appears to be complexity in reality that was not anticipated by the
original code.
The Hive storage plugin holds onto an instance of {{HiveSchemaFactory}}, which
hilds onto a non-secure hive metadata client connection:
{code}
public class HiveSchemaFactory implements SchemaFactory {
// MetaStoreClient created using process user credentials
private final DrillHiveMetaStoreClient processUserMetastoreClient;
{code}
Then, the factor creates a cache of impersonated connections. The code handles
retry of the impersonated connections. But, when security is enabled, each
secure connection requires an authentication token. That token must be
retrieved from the Metastore server using the above
{{processUserMetastoreClient}}.
When the Hive metastore bounces, both the impersonated connection *and*
{{processUserMetastoreClient}} are closed. We attempt to retry the impersonated
connection, but to do so, we send a message over the
{{processUserMetastoreClient}} connection, which is now closed.
The above plays out when we allow the Hive connection to time out and we
attempt to recreate a new one.
However, in the case where the impersonated connection is still in the cache, a
different scenario plays out. Here, we try to reconnect using the same
connection. But, the security token is no longer valid, so the reconnect fails.
In that case, we need to recreate the connection with a new token.
This retry is made hugely more complex because the code that needs to do the
retry is far down in the call stack on the very object to be recreated.
In short, the design of this code is badly broken and can't properly handle the
complexity of secure connection retries.
It may be that a workaround will be to discard the whole enchilada and rebuild
when a connection error occurs. This is a huge hack, but might allow the
existing code to work without major change.
> Must restart drillbits whenever a secure Hive metastore is restarted
> --------------------------------------------------------------------
>
> Key: DRILL-5496
> URL: https://issues.apache.org/jira/browse/DRILL-5496
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is
> restarted unless drillbits are restarted also" attempted to fix a bug in
> Drill in which Drill hangs if Hive is restarted. Now, we see that all
> subsequent "show schemas" queries fail.
> Steps to repro:
> 1. Build a secure cluster (we used MapR)
> 2. Install Hive and Drill services
> 3. Configure drill impersonation and authentication
> 4. Restart hivemeta service
> 5. Connect to drill and execute query involving hive storage, issue occurs
> 6. Restart the drill-bits services and execute the query, issue is no longer
> hit
> The problem occurs in the same place as the earlier fix, but might represent
> a slightly different use case: in this case the connection is secure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)