[
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990303#comment-12990303
]
Camille Fournier commented on ZOOKEEPER-922:
--------------------------------------------
Sorry for the delay in responding.
Yes, "a way to mark a client as failed faster when comeplling evidence
indicates that the client has failed" is exactly right.
When you say:
"you only get the connection reset if you try to send something to the machine,
so unless a watch triggers or there was a request outstanding that completes
after the process fails, you will not get a reset."
Do you mean in the case where the machine fails?
In general, I don't mind if we don't fail fast in the cases of a machine
failing or a network cable being unplugged. It is more important to me to catch
the majority of failure cases (process failure) than the rarer cases of machine
failure.
Presuming that we would detect a machine failure by sending a reverse ping of
some sort, my biggest concern would be additional network traffic. Anything
more complex than pinging every minSessionTimeout or so would probably turn
this into a major undertaking.
Haven't looked at 702, will take a glance.
> enable faster timeout of sessions in case of unexpected socket disconnect
> -------------------------------------------------------------------------
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Reporter: Camille Fournier
> Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of
> the client calling close explicitly, it would be nice to enable the session
> associated with that client to time out faster than the negotiated session
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic
> discovery provider to remove ephemeral nodes for crashed clients quickly,
> while allowing for a longer heartbeat-based timeout for java clients that
> need to do long stop-the-world GC.
> I propose doing this by setting the timeout associated with the crashed
> session to "minSessionTimeout".
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira