[
https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander A. Strelets updated ZOOKEEPER-2894:
---------------------------------------------
Description:
ZooKeeper C Client *+single thread+* build
*The problem:*
First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in
two ways:
a) from a ZooKeeper callback handler (completion or watcher) which in turn is
called through _zookeeper_process()_
b) and from other places -- i.e., when the call-stack does not pass through any
of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_
The issue described here below is +specific only to the case (b)+. So, it's Ok
with the case (a).
When _zookeeper_close()_ is called in the (b) way, the following happens:
1. +If there are requests waiting for responses in _zh.sent_requests_ queue+,
they all are removed from this queue and each of them is "completed" with
personal fake response having status ZCLOSING. Such fake responses are put into
_zh.completions_to_process_ queue. It's Ok
2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither
completion callbacks are called, nor dynamic memory allocated for fake
responses is freed+*
3. Different structures within _zh_ are dismissed and finally _zh_ is freed
This is illustrated on the screenshot attached to this ticket: you may see that
the next instruction to execute will be _free(zh)_ while
_zh.completions_to_process_ queue is not empty (see the "Variables" tab to the
right).
Alternatively, the same situation but in the case (a) is handled properly --
i.e., all completion callback handlers are truly called with ZCLOSING and the
memory is freed, both for subcases (a.1) when there is a failure like
connection-timeout, connection-closed, etc., or (a.2) there is not failure. The
reason is that any callback handler (completion or watcher) in the case (a) is
called from the _process_completions()_ function which runs in the loop until
_zh.completions_to_process_ queue gets empty. So, this function guarantees this
queue to be completely processed even if new completions occur during reaction
on previously queued completions.
*Consequently:*
1. At least there is definitely the +memory leak+ in the case (b) -- all the
fake responses put into _zh.completions_to_process_ queue are lost after
_free(zh)_
2. And it looks like a great misbehavior not to call completions on sent
requests in the case (b) while they are called with ZCLOSING in the case (a) --
so, I think it's not "by design" but a +completions leak+
+To reproduce the case (b) do the following:+
- open ZooKeeper session, connect to a server, receive and process
connected-watch, etc.
- then somewhere +from the main events loop+ call for example _zoo_acreate()_
with valid arguments -- it shall return ZOK
- then, +immediately after it returned+, call _zookeeper_close()_
- note that completion callback handler for _zoo_acreate()_ *will not be called*
+To reproduce the case (a) do the following:+
- the same as above, open ZooKeeper session, connect to a server, receive and
process connected-watch, etc.
- the same as above, somewhere from the main events loop call for example
_zoo_acreate()_ with valid arguments -- it shall return ZOK
- but now don't call _zookeeper_close()_ immediately -- wait for completion
callback on the commenced request
- when _zoo_acreate()_ completes, +from within its completion callback
handler+, call another _zoo_acreate()_ and immediately after it returned call
_zookeeper_close()_
- note that completion callback handler for the second _zoo_acreate()_ *will be
called with ZCLOSING, unlike the case (b) described above*
*To fix this I propose:*
Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done
in _handle_error(zhandle_t *zh,int rc)_.
This is a proposed fix: https://github.com/apache/zookeeper/pull/1000
// Previously proposed fix: https://github.com/apache/zookeeper/pull/363
[upd]
There are another tickets with about the same problem: ZOOKEEPER-1493,
ZOOKEEPER-2073 (the "same" just according to their titles).
However, as I can see, the corresponding patches were applied on the branch
3.4.10, but the effect still persists -- so, this ticket does not duplicate the
mentioned two.
was:
ZooKeeper C Client *+single thread+* build
*The problem:*
First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in
two ways:
a) from a ZooKeeper callback handler (completion or watcher) which in turn is
called through _zookeeper_process()_
b) and from other places -- i.e., when the call-stack does not pass through any
of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_
The issue described here below is +specific only to the case (b)+. So, it's Ok
with the case (a).
When _zookeeper_close()_ is called in the (b) way, the following happens:
1. +If there are requests waiting for responses in _zh.sent_requests_ queue+,
they all are removed from this queue and each of them is "completed" with
personal fake response having status ZCLOSING. Such fake responses are put into
_zh.completions_to_process_ queue. It's Ok
2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither
completion callbacks are called, nor dynamic memory allocated for fake
responses is freed+*
3. Different structures within _zh_ are dismissed and finally _zh_ is freed
This is illustrated on the screenshot attached to this ticket: you may see that
the next instruction to execute will be _free(zh)_ while
_zh.completions_to_process_ queue is not empty (see the "Variables" tab to the
right).
Alternatively, the same situation but in the case (a) is handled properly --
i.e., all completion callback handlers are truly called with ZCLOSING and the
memory is freed, both for subcases (a.1) when there is a failure like
connection-timeout, connection-closed, etc., or (a.2) there is not failure. The
reason is that any callback handler (completion or watcher) in the case (a) is
called from the _process_completions()_ function which runs in the loop until
_zh.completions_to_process_ queue gets empty. So, this function guarantees this
queue to be completely processed even if new completions occur during reaction
on previously queued completions.
*Consequently:*
1. At least there is definitely the +memory leak+ in the case (b) -- all the
fake responses put into _zh.completions_to_process_ queue are lost after
_free(zh)_
2. And it looks like a great misbehavior not to call completions on sent
requests in the case (b) while they are called with ZCLOSING in the case (a) --
so, I think it's not "by design" but a +completions leak+
+To reproduce the case (b) do the following:+
- open ZooKeeper session, connect to a server, receive and process
connected-watch, etc.
- then somewhere +from the main events loop+ call for example _zoo_acreate()_
with valid arguments -- it shall return ZOK
- then, +immediately after it returned+, call _zookeeper_close()_
- note that completion callback handler for _zoo_acreate()_ *will not be called*
+To reproduce the case (a) do the following:+
- the same as above, open ZooKeeper session, connect to a server, receive and
process connected-watch, etc.
- the same as above, somewhere from the main events loop call for example
_zoo_acreate()_ with valid arguments -- it shall return ZOK
- but now don't call _zookeeper_close()_ immediately -- wait for completion
callback on the commenced request
- when _zoo_acreate()_ completes, +from within its completion callback
handler+, call another _zoo_acreate()_ and immediately after it returned call
_zookeeper_close()_
- note that completion callback handler for the second _zoo_acreate()_ *will be
called with ZCLOSING, unlike the case (b) described above*
*To fix this I propose:*
Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done
in _handle_error(zhandle_t *zh,int rc)_.
This is a proposed fix: https://github.com/apache/zookeeper/pull/363
[upd]
There are another tickets with about the same problem: ZOOKEEPER-1493,
ZOOKEEPER-2073 (the "same" just according to their titles).
However, as I can see, the corresponding patches were applied on the branch
3.4.10, but the effect still persists -- so, this ticket does not duplicate the
mentioned two.
> Memory and completions leak on zookeeper_close
> ----------------------------------------------
>
> Key: ZOOKEEPER-2894
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Affects Versions: 3.4.10
> Environment: Linux ubuntu 4.4.0-87-generic
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
> https://github.com/apache/zookeeper.git
> branch-3.4
> Reporter: Alexander A. Strelets
> Assignee: Alexander A. Strelets
> Priority: Critical
> Labels: easyfix, pull-request-available
> Fix For: 3.4.10
>
> Attachments: zk-will-free-zh-and-lose-completions.png
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> ZooKeeper C Client *+single thread+* build
> *The problem:*
> First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in
> two ways:
> a) from a ZooKeeper callback handler (completion or watcher) which in turn is
> called through _zookeeper_process()_
> b) and from other places -- i.e., when the call-stack does not pass through
> any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_
> The issue described here below is +specific only to the case (b)+. So, it's
> Ok with the case (a).
> When _zookeeper_close()_ is called in the (b) way, the following happens:
> 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+,
> they all are removed from this queue and each of them is "completed" with
> personal fake response having status ZCLOSING. Such fake responses are put
> into _zh.completions_to_process_ queue. It's Ok
> 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither
> completion callbacks are called, nor dynamic memory allocated for fake
> responses is freed+*
> 3. Different structures within _zh_ are dismissed and finally _zh_ is freed
> This is illustrated on the screenshot attached to this ticket: you may see
> that the next instruction to execute will be _free(zh)_ while
> _zh.completions_to_process_ queue is not empty (see the "Variables" tab to
> the right).
> Alternatively, the same situation but in the case (a) is handled properly --
> i.e., all completion callback handlers are truly called with ZCLOSING and the
> memory is freed, both for subcases (a.1) when there is a failure like
> connection-timeout, connection-closed, etc., or (a.2) there is not failure.
> The reason is that any callback handler (completion or watcher) in the case
> (a) is called from the _process_completions()_ function which runs in the
> loop until _zh.completions_to_process_ queue gets empty. So, this function
> guarantees this queue to be completely processed even if new completions
> occur during reaction on previously queued completions.
> *Consequently:*
> 1. At least there is definitely the +memory leak+ in the case (b) -- all the
> fake responses put into _zh.completions_to_process_ queue are lost after
> _free(zh)_
> 2. And it looks like a great misbehavior not to call completions on sent
> requests in the case (b) while they are called with ZCLOSING in the case (a)
> -- so, I think it's not "by design" but a +completions leak+
> +To reproduce the case (b) do the following:+
> - open ZooKeeper session, connect to a server, receive and process
> connected-watch, etc.
> - then somewhere +from the main events loop+ call for example _zoo_acreate()_
> with valid arguments -- it shall return ZOK
> - then, +immediately after it returned+, call _zookeeper_close()_
> - note that completion callback handler for _zoo_acreate()_ *will not be
> called*
> +To reproduce the case (a) do the following:+
> - the same as above, open ZooKeeper session, connect to a server, receive and
> process connected-watch, etc.
> - the same as above, somewhere from the main events loop call for example
> _zoo_acreate()_ with valid arguments -- it shall return ZOK
> - but now don't call _zookeeper_close()_ immediately -- wait for completion
> callback on the commenced request
> - when _zoo_acreate()_ completes, +from within its completion callback
> handler+, call another _zoo_acreate()_ and immediately after it returned call
> _zookeeper_close()_
> - note that completion callback handler for the second _zoo_acreate()_ *will
> be called with ZCLOSING, unlike the case (b) described above*
> *To fix this I propose:*
> Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done
> in _handle_error(zhandle_t *zh,int rc)_.
> This is a proposed fix: https://github.com/apache/zookeeper/pull/1000
> // Previously proposed fix: https://github.com/apache/zookeeper/pull/363
> [upd]
> There are another tickets with about the same problem: ZOOKEEPER-1493,
> ZOOKEEPER-2073 (the "same" just according to their titles).
> However, as I can see, the corresponding patches were applied on the branch
> 3.4.10, but the effect still persists -- so, this ticket does not duplicate
> the mentioned two.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)