Hi,
We're encountering an occasional problem which means calls to ns_proxy hang
forever.
It seems to be when an ns_proxy eval command timeouts during evaluation &
it produces a large amount of output.
Under those circumstances ns_proxy becomes inoperable.
Following is a contrived test case which I think demonstrates what we are
experiencing.
(The ns_proxy timeout here is set unrealistically small to force the
timeout - in production the timeout is 1 second)
Normal test case using command:
ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K count=23 | base64] 1
% set handle [ns_proxy get exec]
[07/Apr/2017:13:06:57][28009.7fa7b760b700][-command-] Debug(nsproxy): set
expire in 300000 ms for pool exec slave 28197
[07/Apr/2017:13:06:57][28009.7fa7b760b700][-command-] Debug(nsproxy):
nsproxy: slave 28197 started
[07/Apr/2017:13:06:57][28197.7f3f5dcc3700][-main-] Notice: OpenSSL 1.0.1t
3 May 2016 initialized
[07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-] Notice: starting
[07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper run
[07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper checks pool exec
[07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper sets timeout based on idle diff 1491567117.835940 of pool exec
[07/Apr/2017:13:06:57][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper waits for cond with timeout 1491567117.835940
exec-7
% set result [ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K
count=23 | base64] 1]
[07/Apr/2017:13:07:09][28009.7fa7b760b700][-command-] Debug(nsproxy): proxy
send pool exec slave 28197: exec dd if=/dev/urandom bs=1K count=23 | base64
could not wait for proxy "exec-7": timeout waiting for evaluation
% ns_proxy active exec
{handle exec-7 slave 28197 start 1491566829:518164 script {exec dd
if=/dev/urandom bs=1K count=23 | base64}}
% ns_proxy cleanup
[07/Apr/2017:13:07:26][28009.7fa7b760b700][-command-] Debug(nsproxy):
nsproxy [exec]: close slave 28197 (expire 1000 ms)
[07/Apr/2017:13:07:26][28009.7fa7b760b700][-command-] Debug(nsproxy): set
expire in 1000 ms for pool exec slave 28197
[07/Apr/2017:13:07:26][28009.7fa7b760b700][-command-] Debug(nsproxy):
nsproxy [exec]: slave 28197 closed
[07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper run
[07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper checks pool exec
[07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper sets timeout based on idle diff 1491567146.613112 of pool exec
[07/Apr/2017:13:07:26][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper waits for cond with timeout 1491567146.613112
Hanging test case using command (note: same command except with slightly
larger output):
ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K count=24 | base64] 1
% set handle [ns_proxy get exec]
[07/Apr/2017:13:08:13][28009.7fa7b760b700][-command-] Debug(nsproxy): set
expire in 300000 ms for pool exec slave 28968
[07/Apr/2017:13:08:13][28009.7fa7b760b700][-command-] Debug(nsproxy):
nsproxy: slave 28968 started
[07/Apr/2017:13:08:13][28968.7f0059a9e700][-main-] Notice: OpenSSL 1.0.1t
3 May 2016 initialized
[07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper run
[07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper checks pool exec
[07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper sets timeout based on idle diff 1491567193.223279 of pool exec
[07/Apr/2017:13:08:13][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper waits for cond with timeout 1491567193.223279
exec-7
% set result [ns_proxy eval $handle [list exec dd if=/dev/urandom bs=1K
count=24 | base64] 1]
[07/Apr/2017:13:08:21][28009.7fa7b760b700][-command-] Debug(nsproxy): proxy
send pool exec slave 28968: exec dd if=/dev/urandom bs=1K count=24 | base64
could not wait for proxy "exec-7": timeout waiting for evaluation
% ns_proxy active exec
{handle exec-7 slave 28968 start 1491566901:234806 script {exec dd
if=/dev/urandom bs=1K count=24 | base64}}
% ns_proxy cleanup
[07/Apr/2017:13:08:40][28009.7fa7b760b700][-command-] Debug(nsproxy):
nsproxy [exec]: close slave 28968 (expire 1000 ms)
[07/Apr/2017:13:08:40][28009.7fa7b760b700][-command-] Debug(nsproxy): set
expire in 1000 ms for pool exec slave 28968
[07/Apr/2017:13:08:40][28009.7fa7b760b700][-command-] Debug(nsproxy):
nsproxy [exec]: slave 28968 closed
[07/Apr/2017:13:08:40][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper run
[07/Apr/2017:13:08:40][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper checks pool exec
[07/Apr/2017:13:08:40][28009.7fa78bfff700][-nsproxy:reap-] Debug(nsproxy):
reaper sets timeout based on idle diff 1491567220.342132 of pool exec
ns_proxy cleanup never returns.
The location of the hang is during waitpid() called via Ns_WaitProcess
<https://bitbucket.org/naviserver/naviserver/src/5b852119ea5f3e3e796eb9385c22162a070a6b24/nsproxy/nsproxylib.c?at=default&fileviewer=file-view-default#nsproxylib.c-2935>
The system seems to think that the process hasn't exited.
Any ideas what is going on here?
Regards,
--
David
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel