It's been a while since I was checking these parts...

I also think the generic idea is that when you create a ZooKeeper class on
the client side, it will asynchronously try to connect to the server and
publish its state (connecting / connected / session-timeout / etc) through
the watcher.

I remember that ZooKeeper class is using the ClientCnx class to manage the
state of the connection which has a notion of sessionTimeout and
connectTimeout. It is trying to connect to each known server in a
round-robin fashion. Each connection attempt is tried for 'connectTimeout'
time and I think a SessionTimeoutException is thrown when no server was
responding in sessionTimeout time. (I think by default connectTimeout =
SessionTimeout / number_of_servers). But I am not entirely sure what
happens after the SessionTimeoutException. Normally I think ZooKeeper
client doesn't reconnect automatically after a session timeout, as this is
a case that needs to be handled by the client application. (no consistency
can be guaranteed among different sessions; also ephemeral znodes will be
deleted, etc. see:
https://zookeeper.apache.org/doc/r3.6.3/zookeeperOver.html#Guarantees)

But maybe if there was no active session created yet, then maybe there is
an infinite retry logic in the client.

I don't have much time right now to dig deeper into these classes. I would
assume we already have some unit tests around here too, which could be
checked to see the expected behaviour.

Also I doin't know exactly how authentication failure is handled in the
client side... The server might fall-back to an 'unauthenticated session'
in case of authentication failures, or it can refuse the connection attempt
(this can be configured, at least for SASL authentication:
'*zookeeper.sessionRequireClientSASLAuth'
*)

Also I think the best would be to actually test this with your exact setup.
(e.g. on the clusters we use, we still run ZooKeeper 3.5 in production with
SSL encryption + Kerberos authentication... which might behave differently
than what is your setup with 3.6.3... and also you might use x509
authentication?) But it shouldn't be hard to emulate some authentication
failures with your setup.

Best regards,
Mate

On Fri, Jun 17, 2022 at 11:23 PM Rahul Rane <rr...@linkedin.com.invalid>
wrote:

> Bumping up on this one.
>
> Thanks,
> Rahul Rane
>
> From: Rahul Rane <rr...@linkedin.com>
> Date: Wednesday, May 25, 2022 at 2:57 PM
> To: dev@zookeeper.apache.org <dev@zookeeper.apache.org>
> Subject: Few questions on connection retry on auth failure.
>
> Hello team,
>
>
>
> We need some help in understanding the zookeeper expected behavior and
> potential solution to the problem.
>
>
>
> Context :
>
> We have extended ServerAuthenticationProvider with x509 scheme based on
> 3.6.3 zookeeper server. We are trying to understand connection retry
> scenario. On auth failure, we see that zookeeper client retries to
> establish connection with server until the timeout or infinitely if no
> timeout is set. We are using
> org.apache.zookeeper.server.NettyServerCnxnFactory as Server connection
> factory.
>
>
>
> Couple of questions :
>
>   1.  Is zookeeper client supposed to retry infinitely on auth failure
> from zookeeper server?
>   2.  Is there a way zookeeper client does not perform infinitely retries
> on auth failure errors and bails out after first auth failure itself?
>   3.  We can’t find anything about auth failure errors in zookeeper client
> logs but just that connection is closed. After looking into Netty Server
> code, we see the auth failure is not communicated to client but got masked
> here<
> https://github.com/linkedin/zookeeper/blob/8bcaf7bb3cfa6470e1660e2b36964ae2284197df/zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java#L99>.
> So we were wondering if we are missing something here?
>
>
>
> Thanks for the help and let me know if you need any clarification on any
> of the questions.
>
>
>
> Thanks,
>
> Rahul Rane
>

Reply via email to