[ https://issues.apache.org/jira/browse/SENTRY-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Na Li resolved SENTRY-1813. --------------------------- Resolution: Duplicate Fix Version/s: (was: 2.1.0) 2.0.0 > LeaderStatusMonitor could get into limbo state upon ZK connection loss > ---------------------------------------------------------------------- > > Key: SENTRY-1813 > URL: https://issues.apache.org/jira/browse/SENTRY-1813 > Project: Sentry > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Vamsee Yarlagadda > Assignee: Vamsee Yarlagadda > Fix For: 2.0.0 > > Attachments: Screenshot.png > > > I noticed that during failover testing, if there was a connection loss with > ZK to the sentry servers, the one who is currently the leader gets into a > limbo state as it interrupts the Curator-LeaderSelector thread which no > longer gets revived in the running Sentry process (unless the process is > restarted). > Relevant code under LeaderStatusMonitor > http://github.mtv.cloudera.com/CDH/sentry/blob/cdh5-1.5.1/sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/LeaderStatusMonitor.java#L243-L246 > {code} > try { > isLeader = true; > // Wait until we are interrupted or receive a signal > cond.await(); > } catch (InterruptedException ignored) { > Thread.currentThread().interrupt(); > LOG.info("LeaderStatusMonitor: interrupted"); > } finally { > isLeader = false; > lock.unlock(); > LOG.info("LeaderStatusMonitor: becoming standby"); > } > {code} > I realized even upon the loss of ZK connection, curator framework raises an > Interrupted Exception in LeaderStausMonitor which attempts to call interrupt > on Thread.currentThread which is essentially *Curator-LeaderSelector* thread. > <SCREENSHOT_ATTACHED> > So if the LeaderSelector thread is interrupted, this particular Sentry server > loses the capability of participating in a leader election in the future. And > if this happens to all the sentry servers in the cluster, any further loss > could get into a limbo state. > And during this state, Sentry no longer reads events from HMS and thereby > users can no longer be able to issue DDL statements like CREATE etc. However > GRANT, REVOKE still work as they don't go through HMSFollower. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)