[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

Vadim Spector (JIRA) Tue, 25 Jul 2017 14:23:26 -0700

     [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vadim Spector updated SENTRY-1866:
----------------------------------
    Description: 
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) to the same Sentry server that went down can be redundant and way too 
much. It can also be misleading, because there is no mandatory link between 
when connection was established and when an attempt to use it has failed, so we 
can report failures of the old connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).

  was:
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) to the same Sentry server that went down can be redundant and way too 
much. It can also be misleading, because there is no mandatory link between 
when connection was established and when an attempt to use it has failed, so we 
can report failures of the old connections.

Sentry HA-specific: when the Sentry client fails over from one sentry server to 
the other, it does not print a message that it has done so. Have such a client 
print a simple, clear INFO level message when the client fails over form one 
Sentry server to another.

Design considerations:

"Sentry client" stands for a specific class instance capable of connecting to a 
specific Sentry server instance from some app (usually another Hadoop service). 
In HA scenario, Sentry client relies on connection pooling (SentryTransportPool 
class) to select one of several available configured Sentry server instances. 
Whenever connection fails, Sentry client simply asks SentryTransportPool to a) 
invalidate this specific connection and b) get another connection instead. 
There is no monitoring of Sentry server liveliness per se. Each Sentry client 
finds out about a failure independently and only at the time of trying to use 
it. Thus there may be no particular correlation between the time of the 
discovery of connection failure and the time Sentry server actually becomes 
unavailable. E.g. a client can discover a failure of the old connection, long 
after Sentry server crushed and then was restarted (and maybe restarted more 
than once!).

Intuitively, one would like yto have a single log per Sentry server 
crush/shutdown; but due to the explanations above, it seems difficult, if not 
impossible, to group the connections by instance(s) of Sentry server when these 
connections were initiated. Therefore, it may be challenging to say whether 
multiple connection failures have to do with "the same" Sentry server instance 
going down. Therefore, it is difficult to report exactly one connection failure 
per one Sentry server shutdown/crush event.

Yet, the desire to have visibility into such events in the field is 
understandable. At the same time, if we simply log every connection failure, 
such logging can be massive - there may be many concurrent connections to 
Sentry server(s) from the same app. Such logging would be less than useful.

The solution is required to use some less than perfect rules, by which the 
number of connection failure logs can be contained. The alternative solution of 
introducing periodic pinging of Sentry server and only logging pinging failures 
would be possible as well (and it would be awesome if Sentry server responded 
to pings with the server-id initialized as the server start time stamp - this 
would totally solve the problem), but requires more radical changes.

The simplest solution seems to be as follows: since the recovery of the failed 
Sentry serve is likely to take some time, we do not need to be too clever; it 
may just be enough to report each connection failure to a given Sentry instance 
no more often than once every N (configurable value) seconds. If one connection 
failure to Sentry server instance X has been reported, another one won't be 
reported before N seconds expire. This will keep the number of connection 
failure messages at bay. Such logs may still be confusing, if a client attempts 
to use some old connection from the old server instance after some idle period, 
and after the problem has long been fixed, but this is arguably still better 
than nothing.

Alternative suggestions are welcome.


> Add ping Thrift APIs for Sentry services
> ----------------------------------------
>
>                 Key: SENTRY-1866
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1866
>             Project: Sentry
>          Issue Type: Improvement
>            Reporter: Vadim Spector
>
> Motivation: can think of several, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) to the same Sentry server that went down can be redundant and 
> way too much. It can also be misleading, because there is no mandatory link 
> between when connection was established and when an attempt to use it has 
> failed, so we can report failures of the old connections.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server inavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

Reply via email to