[
https://issues.apache.org/jira/browse/SENTRY-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kalyan kumar kalvagadda updated SENTRY-1813:
--------------------------------------------
Fix Version/s: (was: 2.0.0)
2.1.0
Moving all unresolved jiras with fix version 2.0.0 to 2.1.0. Please change the
fix version if you intend to make it into 2.0.0 release.
> LeaderStatusMonitor could get into limbo state upon ZK connection loss
> ----------------------------------------------------------------------
>
> Key: SENTRY-1813
> URL: https://issues.apache.org/jira/browse/SENTRY-1813
> Project: Sentry
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Vamsee Yarlagadda
> Assignee: Vamsee Yarlagadda
> Fix For: 2.1.0
>
> Attachments: Screenshot.png
>
>
> I noticed that during failover testing, if there was a connection loss with
> ZK to the sentry servers, the one who is currently the leader gets into a
> limbo state as it interrupts the Curator-LeaderSelector thread which no
> longer gets revived in the running Sentry process (unless the process is
> restarted).
> Relevant code under LeaderStatusMonitor
> http://github.mtv.cloudera.com/CDH/sentry/blob/cdh5-1.5.1/sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/LeaderStatusMonitor.java#L243-L246
> {code}
> try {
> isLeader = true;
> // Wait until we are interrupted or receive a signal
> cond.await();
> } catch (InterruptedException ignored) {
> Thread.currentThread().interrupt();
> LOG.info("LeaderStatusMonitor: interrupted");
> } finally {
> isLeader = false;
> lock.unlock();
> LOG.info("LeaderStatusMonitor: becoming standby");
> }
> {code}
> I realized even upon the loss of ZK connection, curator framework raises an
> Interrupted Exception in LeaderStausMonitor which attempts to call interrupt
> on Thread.currentThread which is essentially *Curator-LeaderSelector* thread.
> <SCREENSHOT_ATTACHED>
> So if the LeaderSelector thread is interrupted, this particular Sentry server
> loses the capability of participating in a leader election in the future. And
> if this happens to all the sentry servers in the cluster, any further loss
> could get into a limbo state.
> And during this state, Sentry no longer reads events from HMS and thereby
> users can no longer be able to issue DDL statements like CREATE etc. However
> GRANT, REVOKE still work as they don't go through HMSFollower.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)