[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700187#comment-14700187
 ] 

Flavio Junqueira commented on KAFKA-1387:
-----------------------------------------

bq. I thought that when the previous session has ended (e.g. expired), its 
ephemeral node will be "eventually" removed?

If the session ends cleanly, by the client submitting a closeSession request, 
then the session closes and the ephemerals are deleted with the request. But, 
if the client crashes and the server simply stops hearing from the client, then 
the session has to time out and expire so it takes some time.

bq. Does ZooKeeper itself have a leasing mechanism?

I'm referring to the fact that the ephemeral represents a lease that is revoked 
when the session times out.

I'm not sure if this is clear, but one of the problems I'm pointing out is that 
zkclient might end up creating the ephemeral znode in your *current* session. 
In this case, the znode won't go away. Here is actually another problem I found 
along the same lines. The createEphemeral call in ZkClient ends up calling 
retryUntilConnected, which retries even when the session expires:

{code}
            try {
                return callable.call();
            } catch (ConnectionLossException e) {
                // we give the event thread some time to update the status to 
'Disconnected'
                Thread.yield();
                waitForRetry();
            } catch (SessionExpiredException e) {
                // we give the event thread some time to update the status to 
'Expired'
                Thread.yield();
                waitForRetry();
            }
{code}

In this case, say that one call to createEphemeral via handleNewSession happens 
during a given session, but the session expires before the operation goes 
through. The client will retry with the new session. When the consumer tries 
again, it will fail because the znode is there and won't go away. This is 
another case in which the znode won't go away because it has been created in 
the current session.

> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Fedor Korotkiy
>            Priority: Blocker
>              Labels: newbie, patch, zkclient-problems
>         Attachments: kafka-1387.patch
>
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>  
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to