[ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
----------------------------------
    Summary: Add ping Thrift APIs for Sentry services  (was: Add ping Thrift 
interface to the existing Sentry clients)

> Add ping Thrift APIs for Sentry services
> ----------------------------------------
>
>                 Key: SENTRY-1866
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1866
>             Project: Sentry
>          Issue Type: Improvement
>            Reporter: Vadim Spector
>
> Sentry HA-specific: when the Sentry client fails over from one sentry server 
> to the other, it does not print a message that it has done so. Have such a 
> client print a simple, clear INFO level message when the client fails over 
> form one Sentry server to another.
> Design considerations:
> "Sentry client" stands for a specific class instance capable of connecting to 
> a specific Sentry server instance from some app (usually another Hadoop 
> service). In HA scenario, Sentry client relies on connection pooling 
> (SentryTransportPool class) to select one of several available configured 
> Sentry server instances. Whenever connection fails, Sentry client simply asks 
> SentryTransportPool to a) invalidate this specific connection and b) get 
> another connection instead. There is no monitoring of Sentry server 
> liveliness per se. Each Sentry client finds out about a failure independently 
> and only at the time of trying to use it. Thus there may be no particular 
> correlation between the time of the discovery of connection failure and the 
> time Sentry server actually becomes unavailable. E.g. a client can discover a 
> failure of the old connection, long after Sentry server crushed and then was 
> restarted (and maybe restarted more than once!).
> Intuitively, one would like yto have a single log per Sentry server 
> crush/shutdown; but due to the explanations above, it seems difficult, if not 
> impossible, to group the connections by instance(s) of Sentry server when 
> these connections were initiated. Therefore, it may be challenging to say 
> whether multiple connection failures have to do with "the same" Sentry server 
> instance going down. Therefore, it is difficult to report exactly one 
> connection failure per one Sentry server shutdown/crush event.
> Yet, the desire to have visibility into such events in the field is 
> understandable. At the same time, if we simply log every connection failure, 
> such logging can be massive - there may be many concurrent connections to 
> Sentry server(s) from the same app. Such logging would be less than useful.
> The solution is required to use some less than perfect rules, by which the 
> number of connection failure logs can be contained. The alternative solution 
> of introducing periodic pinging of Sentry server and only logging pinging 
> failures would be possible as well (and it would be awesome if Sentry server 
> responded to pings with the server-id initialized as the server start time 
> stamp - this would totally solve the problem), but requires more radical 
> changes.
> The simplest solution seems to be as follows: since the recovery of the 
> failed Sentry serve is likely to take some time, we do not need to be too 
> clever; it may just be enough to report each connection failure to a given 
> Sentry instance no more often than once every N (configurable value) seconds. 
> If one connection failure to Sentry server instance X has been reported, 
> another one won't be reported before N seconds expire. This will keep the 
> number of connection failure messages at bay. Such logs may still be 
> confusing, if a client attempts to use some old connection from the old 
> server instance after some idle period, and after the problem has long been 
> fixed, but this is arguably still better than nothing.
> Alternative suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to