[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063503#comment-15063503
 ] 

nijel commented on ZOOKEEPER-2251:
----------------------------------

hi [~marshad] and [~suda]

I observed this when i am doing reliability test for a banking customer
Here we test for any network abnormality and packet drop.

Here in the scenario packet is sent and wait for ever. Even if the server is 
not responding due to any reason, this issue can happen

so my opinion is to have this time out since many services' high availability 
solution depends on zookeeper.



> Add Client side packet response timeout to avoid infinite wait.
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2251
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: nijel
>            Assignee: Arshad Mohammad
>         Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, 
> ZOOKEEPER-2251-03.patch
>
>
> I came across one issue related to Client side packet response timeout In my 
> cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it 
> is waiting for the response/ACK for the operation performed (synchronous API 
> used here).
> I am using 
> zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately 
> the server response packet lost. Now, client will enter into infinite 
> waiting. 
> https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add 
> packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to