[
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vadim Spector updated SENTRY-1866:
----------------------------------
Summary: Add ping Thrift interface to the existing Sentry clients (was:
Log Sentry server failover events on Sentry clients in HA scenario)
> Add ping Thrift interface to the existing Sentry clients
> --------------------------------------------------------
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
> Issue Type: Improvement
> Reporter: Vadim Spector
>
> Sentry HA-specific: when the Sentry client fails over from one sentry server
> to the other, it does not print a message that it has done so. Have such a
> client print a simple, clear INFO level message when the client fails over
> form one Sentry server to another.
> Design considerations:
> "Sentry client" stands for a specific class instance capable of connecting to
> a specific Sentry server instance from some app (usually another Hadoop
> service). In HA scenario, Sentry client relies on connection pooling
> (SentryTransportPool class) to select one of several available configured
> Sentry server instances. Whenever connection fails, Sentry client simply asks
> SentryTransportPool to a) invalidate this specific connection and b) get
> another connection instead. There is no monitoring of Sentry server
> liveliness per se. Each Sentry client finds out about a failure independently
> and only at the time of trying to use it. Thus there may be no particular
> correlation between the time of the discovery of connection failure and the
> time Sentry server actually becomes unavailable. E.g. a client can discover a
> failure of the old connection, long after Sentry server crushed and then was
> restarted (and maybe restarted more than once!).
> Intuitively, one would like yto have a single log per Sentry server
> crush/shutdown; but due to the explanations above, it seems difficult, if not
> impossible, to group the connections by instance(s) of Sentry server when
> these connections were initiated. Therefore, it may be challenging to say
> whether multiple connection failures have to do with "the same" Sentry server
> instance going down. Therefore, it is difficult to report exactly one
> connection failure per one Sentry server shutdown/crush event.
> Yet, the desire to have visibility into such events in the field is
> understandable. At the same time, if we simply log every connection failure,
> such logging can be massive - there may be many concurrent connections to
> Sentry server(s) from the same app. Such logging would be less than useful.
> The solution is required to use some less than perfect rules, by which the
> number of connection failure logs can be contained. The alternative solution
> of introducing periodic pinging of Sentry server and only logging pinging
> failures would be possible as well (and it would be awesome if Sentry server
> responded to pings with the server-id initialized as the server start time
> stamp - this would totally solve the problem), but requires more radical
> changes.
> The simplest solution seems to be as follows: since the recovery of the
> failed Sentry serve is likely to take some time, we do not need to be too
> clever; it may just be enough to report each connection failure to a given
> Sentry instance no more often than once every N (configurable value) seconds.
> If one connection failure to Sentry server instance X has been reported,
> another one won't be reported before N seconds expire. This will keep the
> number of connection failure messages at bay. Such logs may still be
> confusing, if a client attempts to use some old connection from the old
> server instance after some idle period, and after the problem has long been
> fixed, but this is arguably still better than nothing.
> Alternative suggestions are welcome.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)