[ 
https://issues.apache.org/jira/browse/KAFKA-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942403#comment-13942403
 ] 

Neha Narkhede commented on KAFKA-1155:
--------------------------------------

[~jkreps] Yes, that is how zookeeper works but not how Kafka assumes it works. 
In other words, the behavior of Kafka needs to change to account for that 
zookeeper guarantee. Filed this bug to make those changes on the kafka server. 
Unfortunately, this might turn out to be a pretty large change :(

> Kafka server can miss zookeeper watches during long zkclient callbacks
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-1155
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1155
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8.0, 0.8.1
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>
> On getting a zookeeper watch, zkclient invokes the blocking user callback and 
> only re-registers the watch after the callback returns. This leaves a 
> possibly large window of time when Kafka has not registered for watches on 
> the desired zookeeper paths and hence can miss important state changes (on 
> the controller). In any case, it is worth noting that even though zookeeper 
> has a read-and-set-watch API, there can always be a window of time between 
> the watch being fired, the callback and the read-and-set-watch API call. Due 
> to the zkclient wrapper, it is difficult to handle this properly in the Kafka 
> code unless we directly use the zookeeper client. One way of getting around 
> this issue is to use timestamps on the paths and when a watch fires, check if 
> the timestamp in zk is different from the one in the callback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to