[ https://issues.apache.org/jira/browse/ZOOKEEPER-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538037#comment-17538037 ]
Mate Szalay-Beko edited comment on ZOOKEEPER-3706 at 5/17/22 9:02 AM: ---------------------------------------------------------------------- this was actually cherry-picked to multiple branches already. Usually we haven't created separate Jira tasks / PRs for the cherry-picks (except ZOOKEEPER-4223 for branch-3.6). I updated the fixed versions to: 3.5.10, 3.6.3, 3.7.0, 3.8.0 was (Author: symat): this was actually cherry-picked to multiple branches already. Usually we haven't created separate Jira tasks / PRs for the cherry-picks (expect ZOOKEEPER-4223 for branch-3.6). I updated the fixed versions to: 3.5.10, 3.6.3, 3.7.0, 3.8.0 > ZooKeeper.close() would leak SendThread when the network is broken > ------------------------------------------------------------------ > > Key: ZOOKEEPER-3706 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3706 > Project: ZooKeeper > Issue Type: Bug > Components: java client > Affects Versions: 3.6.0, 3.4.14, 3.5.6 > Reporter: Pierre Yin > Assignee: Pierre Yin > Priority: Major > Labels: pull-request-available > Fix For: 3.5.10, 3.6.3, 3.7.0, 3.8.0 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > The close method of ZooKeeper may cause the leak of SendThread when the > network is broken. > When the network is broken, the SendThread of ZooKeeper client falls into the > continuous reconnecting scenario. But there is an unsafe point which is just > at the moment before startConnect() during the continuous reconnecting. If > SendThread.close() in another thread hit the unsafe point, startConnect() > would sleep some time and force to change state to States.CONNECTING although > SendThread.close() already set state to States.CLOSED. In this case, the > SendThread would be never be dead and nobody would change the state again. > In normal case, ZooKeeper.close() would be blocked forever to wait > closeSession packet is finished until the network broken is recovered. But if > user set the request timeout, ZooKeeper.close() would break the block waiting > within timeout and invoke SendThread.close() to change state to CLOSED. > That's why SendThread.close() can hit the unsafe point. > Set request timeout is a very common practice. > I propose a patch and send it out later. > Maybe someone can help to review it. > > Thanks > > -- This message was sent by Atlassian Jira (v8.20.7#820007)