[
https://issues.apache.org/jira/browse/DRILL-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008494#comment-16008494
]
Paul Rogers commented on DRILL-5496:
------------------------------------
As it turns out, the Hive client in the Hive storage plugin is not designed to
handle security.
* When we start the Hive storage plugin, we create a single instance of the
{{HiveSchemaFactory}}.
* {{HiveSchemaFactory}} holds on to a {{DrillHiveMetaStoreClient}} connection.
In the secure case, this connection is used to get security certificates for us
in creating secure connections.
* {{HiveSchemaFactory}} has a Guava loading cache of user-specific, secure
connections.
When the Hive metastore goes down, all connections become invalid including the
non-secure and all the secure connections. But, we try to handle the problem as
follows.
If a secure connection times out:
* Use the (now-invalid) insecure connection to get another ticket. But, since
this isn't valid, we can't reconnect and so always fail.
If we try to use a cached secure connection before timeout, then this happens:
* Try to send a message.
* When that fails, try to reconnect (using the old certificate for the prior
session.)
* When that fails, give up.
What we really need to do is:
* Recreate both the insecure *and* secure connections.
But, since the secure connection cache is held on the insecure connection, we
can't easily recreate that connection: we'd get a new object.
So, we have to make some changes.
* Hold the secure connection cache on an object other than a connection.
* Use a connection proxy instead of the connection as key to the cache. The
proxy allows maintaining the cache entry, but replacing the secure connection
with a new one. (The proxy is just a wrapper around a replacable secure
connection.)
* Similarly, provide a thread-safe way to reconnect the non-secure connection
used to get tickets for the secure connection.
All this is not a huge project, but it is more than can be done in the context
of a quick fix for this ticket. So, for this ticket I used a bit of a hack:
just throw away the entire schema builder and create a new one. But, that
solution requires synchronizing all requests and is far from ideal.
> Must restart drillbits whenever a secure Hive metastore is restarted
> --------------------------------------------------------------------
>
> Key: DRILL-5496
> URL: https://issues.apache.org/jira/browse/DRILL-5496
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> DRILL-4964: "Drill fails to connect to hive metastore after hive metastore is
> restarted unless drillbits are restarted also" attempted to fix a bug in
> Drill in which Drill hangs if Hive is restarted. Now, we see that all
> subsequent "show schemas" queries fail.
> Steps to repro:
> 1. Build a secure cluster (we used MapR)
> 2. Install Hive and Drill services
> 3. Configure drill impersonation and authentication
> 4. Restart hivemeta service
> 5. Connect to drill and execute query involving hive storage, issue occurs
> 6. Restart the drill-bits services and execute the query, issue is no longer
> hit
> The problem occurs in the same place as the earlier fix, but might represent
> a slightly different use case: in this case the connection is secure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)