[ 
https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728905#comment-13728905
 ] 

Neha Narkhede commented on KAFKA-992:
-------------------------------------

Thanks for patch v3. Few more review comments -

6. We should get session timeout from KafkaConfig, instead of hardcoding it.
7. It seems like the return should actually be moved inside the try block. That 
is the only time we don't want to retry since the operation is successful
8. You are right about createEphemeralPathExpectConflict. It already handles 
3.1 (in my comments above)

This bug is very serious that can halt correct operation of a 0.8 cluster. In a 
typical production deployment of Kafka where there are many consumers writing 
offsets to the same zookeeper cluster that the 08 cluster is connected to, 
there is a higher risk of hitting this bug. On the other hand, you can always 
increase the session timeout enough to get around this. However, in that case, 
if a broker crashes or has to be killed, it takes as long as session timeout 
for the consumers to recover. We have hit this bug in production at LinkedIn 
several times and have also had to kill 08 brokers due to bugs in controlled 
shutdown (KAFKA-999). 

I understand that we want to stop taking patches on 08. We are still on 08 beta 
in open source. Until trunk is ready to be released, companies that have Kafka 
08-beta running in production can run into blocker bugs (KAFKA-992, KAFKA-999). 
What release pattern can we follow here ? Does it make sense to only take 
critical fixes on 0.8 and leave other changes to trunk. That allows critical 
bug fixes to go to production before 0.8.1 is ready for release.





                
> Double Check on Broker Registration to Avoid False NodeExist Exception
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-992
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Neha Narkhede
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-992.v1.patch, KAFKA-992.v2.patch, 
> KAFKA-992.v3.patch
>
>
> The current behavior of zookeeper for ephemeral nodes is that session 
> expiration and ephemeral node deletion is not an atomic operation. 
> The side-effect of the above zookeeper behavior in Kafka, for certain corner 
> cases, is that ephemeral nodes can be lost even if the session is not 
> expired. The sequence of events that can lead to lossy ephemeral nodes is as 
> follows -
> 1. The session expires on the client, it assumes the ephemeral nodes are 
> deleted, so it establishes a new session with zookeeper and tries to 
> re-create the ephemeral nodes. 
> 2. However, when it tries to re-create the ephemeral node,zookeeper throws 
> back a NodeExists error code. Now this is legitimate during a session 
> disconnect event (since zkclient automatically retries the
> operation and raises a NodeExists error). Also by design, Kafka server 
> doesn't have multiple zookeeper clients create the same ephemeral node, so 
> Kafka server assumes the NodeExists is normal. 
> 3. However, after a few seconds zookeeper deletes that ephemeral node. So 
> from the client's perspective, even though the client has a new valid 
> session, its ephemeral node is gone.
> This behavior is triggered due to very long fsync operations on the zookeeper 
> leader. When the leader wakes up from such a long fsync operation, it has 
> several sessions to expire. And the time between the session expiration and 
> the ephemeral node deletion is magnified. Between these 2 operations, a 
> zookeeper client can issue a ephemeral node creation operation, that could've 
> appeared to have succeeded, but the leader later deletes the ephemeral node 
> leading to permanent ephemeral node loss from the client's perspective. 
> Thread from zookeeper mailing list: 
> http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to