Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Ted Yu Thu, 02 Nov 2017 15:00:30 -0700

Stephane:
bq. hasn't acted in over a year

The above fact implies some reluctance from the zookeeper community to
fully solve the issue (maybe due to technical issues).
Anyway, we should plan on not relying on the fix to go through in the near
future.


As for Jun's latest suggestion, I think we should add periodic logging
indicating the retry.

A KIP is not needed if we go that route.

Cheers

On Thu, Nov 2, 2017 at 2:54 PM, Stephane Maarek <
[email protected]> wrote:

> Hi Jun
>
> I think this is a better option. Would that change require a kip then as
> it's not a change in public API ?
>
> @ted it was marked as a blocked for 3.4.11 but they pushed it. It seems
> that the owner of the pr hasn't acted in over a year and I think someone
> needs to take ownership of that. Additionally, this would be a change in
> Kafka zookeeper client dependency, so no need to update your zookeeper
> quorum to benefit from the change
>
> Thanks
> Stéphane
>
>
> On 3 Nov. 2017 8:45 am, "Jun Rao" <[email protected]> wrote:
>
> Stephane, Jeff,
>
> Another option is to not expose the reconnect timeout config and just retry
> the creation of Zookeeper forever. This is an improvement from the current
> situation and if zookeeper-2184 is fixed in the future, we don't need to
> deprecate the config.
>
> Thanks,
>
> Jun
>
> On Thu, Nov 2, 2017 at 9:02 AM, Ted Yu <[email protected]> wrote:
>
> > ZOOKEEPER-2184 is scheduled for 3.4.12 whose release is unknown.
> >
> > I think adding the session recreation on Kafka side should benefit Kafka
> > users, especially those who don't plan to move to 3.4.12+ in the near
> > future.
> >
> > On Wed, Nov 1, 2017 at 6:34 PM, Jun Rao <[email protected]> wrote:
> >
> > > Hi, Stephane,
> > >
> > > 3) The difference is that currently, there is no retry when re-creating
> > the
> > > Zookeeper object when a ZK session expires. So, if the re-creation of
> > > Zookeeper fails, the broker just logs the error and the Zookeeper
> object
> > > will never be created again. With this KIP, we will keep retrying the
> > > creation of Zookeeper until success.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> > > [email protected]> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 1) The reason I'm asking about it is I wonder if it's not worth
> > focusing
> > > > the development efforts on taking ownership of the existing PR (
> > > > https://github.com/apache/zookeeper/pull/150)  to fix
> ZOOKEEPER-2184,
> > > > rebase it and have it merged into the ZK codebase shortly.  I feel
> this
> > > KIP
> > > > might introduce a setting that could be deprecated shortly and
> confuse
> > > the
> > > > end user a bit further with one more knob to turn.
> > > >
> > > > 3) I'm not sure if I fully understand, sorry for the beginner's
> > question:
> > > > if the default timeout is infinite, then it won't change anything to
> > how
> > > > Kafka works from today, does it? (unless I'm missing something
> sorry).
> > If
> > > > not set to infinite, then we introduce the risk of a whole cluster
> > > shutting
> > > > down at once?
> > > >
> > > > Thanks,
> > > > Stephane
> > > >
> > > > On 31/10/17, 1:00 pm, "Jun Rao" <[email protected]> wrote:
> > > >
> > > >     Hi, Stephane,
> > > >
> > > >     Thanks for the reply.
> > > >
> > > >     1) Fixing the issue in ZK will be ideal. Not sure when it will
> > happen
> > > >     though. Once it's fixed, we can probably deprecate this config.
> > > >
> > > >     2) That could be useful. Is there a java api to do that at
> runtime?
> > > > Also,
> > > >     invalidating DNS cache doesn't always fix the issue of unresolved
> > > > host. In
> > > >     some of the cases, human intervention is needed.
> > > >
> > > >     3) The default timeout is infinite though.
> > > >
> > > >     Jun
> > > >
> > > >
> > > >     On Sat, Oct 28, 2017 at 11:48 PM, Stephane Maarek <
> > > >     [email protected]> wrote:
> > > >
> > > >     > Hi Jun,
> > > >     >
> > > >     > I think this is very helpful. Restarting Kafka brokers in case
> of
> > > > zookeeper
> > > >     > host change is not a well known operation.
> > > >     >
> > > >     > Few questions:
> > > >     > 1) would it not be worth fixing the problem at the source ?
> This
> > > has
> > > > been
> > > >     > stuck for a while though, maybe a little push would help :
> > > >     > https://issues.apache.org/jira/plugins/servlet/mobile#
> > > > issue/ZOOKEEPER-2184
> > > >     >
> > > >     > 2) upon recreating the zookeeper object , is it not possible to
> > > > invalidate
> > > >     > the DNS cache so that it resolves the new hostname?
> > > >     >
> > > >     > 3) could the cluster be down in this situation: one migrates an
> > > > entire
> > > >     > zookeeper cluster to new machines (one by one). The quorum is
> > still
> > > > alive
> > > >     > without downtime, but now every broker in a cluster can't
> resolve
> > > > zookeeper
> > > >     > at the same time. They all shut down at the same time after the
> > new
> > > >     > time-out setting.
> > > >     >
> > > >     > Thanks !
> > > >     > Stéphane
> > > >     >
> > > >     > On 28 Oct. 2017 9:42 am, "Jun Rao" <[email protected]> wrote:
> > > >     >
> > > >     > > Hi, Everyone,
> > > >     > >
> > > >     > > We created "KIP-217: Expose a timeout to allow an expired ZK
> > > > session to
> > > >     > be
> > > >     > > re-created".
> > > >     > >
> > > >     > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >     > > 217%3A+Expose+a+timeout+to+allow+an+expired+ZK+session+
> > > > to+be+re-created
> > > >     > >
> > > >     > > Please take a look and provide your feedback.
> > > >     > >
> > > >     > > Thanks,
> > > >     > >
> > > >     > > Jun
> > > >     > >
> > > >     >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

Reply via email to