Whenever I run a lot of xmlrpc requests, after a while the xmlrpc process becomes unresponsive. It is still there, it doesn't crash, the rest of openser works as expected, but the xmlrpc process simply doesn't answer any request anymore until restarted. The number of requests I have to issue to make this happen is variable. Sometimes is takes only 100 requests, other times is need up to 20000, but the end result is always the same.
The xmlrpc request I ran is refreshWatchers(account, 'presence', 0) called repeatedly until it no longer receives a reply. The account is a subscriber that is present in the system. But I also tried pwd and ps and got the same result. Below is the strace right before and while it got stuck. As it can be seen it got stuck in futex() which seems to indicate some sort of locking mechanism that has entered a deadlock. The futex is not called before that point in the whole strace output, and when it is called it locks. --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4353 close(9) = 0 waitpid(-1, 0xbfdcca6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) getpeername(8, {sa_family=AF_INET, sin_port=htons(38926), sin_addr=inet_addr("10.0.0.146")}, [16]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7c69bc8) = 4354 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(6, {sa_family=AF_INET, sin_port=htons(38928), sin_addr=inet_addr("10.0.0.146")}, [16]) = 9 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4354 close(8) = 0 waitpid(-1, 0xbfdcca6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) getpeername(9, {sa_family=AF_INET, sin_port=htons(38928), sin_addr=inet_addr("10.0.0.146")}, [16]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7c69bc8) = 4355 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(6, {sa_family=AF_INET, sin_port=htons(38930), sin_addr=inet_addr("10.0.0.146")}, [16]) = 8 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4355 close(9) = 0 waitpid(-1, 0xbfdcca6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) getpeername(8, {sa_family=AF_INET, sin_port=htons(38930), sin_addr=inet_addr("10.0.0.146")}, [16]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7c69bc8) = 4356 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(6, {sa_family=AF_INET, sin_port=htons(38932), sin_addr=inet_addr("10.0.0.146")}, [16]) = 9 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4356 close(8) = 0 waitpid(-1, 0xbfdcca6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) getpeername(9, {sa_family=AF_INET, sin_port=htons(38932), sin_addr=inet_addr("10.0.0.146")}, [16]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7c69bc8) = 4357 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(6, {sa_family=AF_INET, sin_port=htons(38934), sin_addr=inet_addr("10.0.0.146")}, [16]) = 8 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4357 close(9) = 0 waitpid(-1, 0xbfdcca6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) getpeername(8, {sa_family=AF_INET, sin_port=htons(38934), sin_addr=inet_addr("10.0.0.146")}, [16]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7c69bc8) = 4358 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(6, {sa_family=AF_INET, sin_port=htons(38936), sin_addr=inet_addr("10.0.0.146")}, [16]) = 9 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4358 close(8) = 0 waitpid(-1, 0xbfdcca6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) getpeername(9, {sa_family=AF_INET, sin_port=htons(38936), sin_addr=inet_addr("10.0.0.146")}, [16]) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7c69bc8) = 4359 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(6, {sa_family=AF_INET, sin_port=htons(38938), sin_addr=inet_addr("10.0.0.146")}, [16]) = 8 kill(4359, SIG_0) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4359 close(9) = 0 futex(0xb7dc9140, FUTEX_WAIT, 2, NULL -- Dan _______________________________________________ Devel mailing list Devel@lists.openser.org http://lists.openser.org/cgi-bin/mailman/listinfo/devel