Hi again,

On 03.07.2013 17:38, Maxim Dounin wrote:
Hello!

On Wed, Jul 03, 2013 at 04:48:29PM +0200, Florian S. wrote:

Hi together!

I'm having occasionally trouble with worker processes left <defunct>
and nginx stopping handling signals (HUP and even TERM) in general.

Upon reconfigure signal, the log shows four new processes being
spawned, while the old four processes are shutting down:

[notice] 5159#0: using the "epoll" event method
[notice] 5159#0: nginx/1.4.1
[notice] 5159#0: built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
[notice] 5159#0: OS: Linux 3.9.7-147-x86
[notice] 5159#0: getrlimit(RLIMIT_NOFILE): 100000:100000
[notice] 5159#0: start worker processes
[notice] 5159#0: start worker process 5330
[notice] 5159#0: start worker process 5331
[notice] 5159#0: start worker process 5332
[notice] 5159#0: start worker process 5333
[notice] 5159#0: signal 1 (SIGHUP) received, reconfiguring
[notice] 5159#0: reconfiguring
[notice] 5159#0: using the "epoll" event method
[notice] 5159#0: start worker processes
[notice] 5159#0: start worker process 12457
[notice] 5159#0: start worker process 12458
[notice] 5159#0: start worker process 12459
[notice] 5159#0: start worker process 12460
[notice] 5159#0: start cache manager process 12461
[notice] 5159#0: start cache loader process 12462
[notice] 5331#0: gracefully shutting down
[notice] 5330#0: gracefully shutting down
[notice] 5331#0: exiting
[notice] 5330#0: exiting
[notice] 5331#0: exit
[notice] 5330#0: exit
[notice] 5332#0: gracefully shutting down
[notice] 5159#0: signal 17 (SIGCHLD) received
[notice] 5159#0: worker process 5331 exited with code 0
[notice] 5332#0: exiting
[notice] 5332#0: exit
[notice] 5333#0: gracefully shutting down
[notice] 5333#0: exiting
[notice] 5333#0: exit

After that, nginx is fully operational and serving requests --
however, ps yields:

root    5159 0.0 0.0 6248 1696 ?     Ss 10:43 0:00 nginx: master
process /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf
nobody  5330 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
nobody  5332 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
nobody  5333 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
nobody 12457 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
nobody 12458 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
nobody 12459 0.0 0.0 8332 3544 ?     S  10:44 0:00 nginx: worker process
nobody 12460 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
nobody 12461 0.0 0.0 6296 1068 ?     S  10:44 0:00 nginx: cache
manager process
nobody 12462 0.0 0.0    0    0 ?     Z  10:44 0:00 [nginx] <defunct>

In the log one can see that SIGCHLD is only received once for 5331,
which does not show up as zombie -- in contrast to the workers 5330,
5332, 5333, and the cache loader 12462.
Much more serious is that neither

/chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf -s(stop|reload)

nor

kill 5159

seem to get handled by nginx anymore (nothing in the log and no
effect). Maybe the master process is stuck waiting for some mutex?:

strace -p 5159
Process 5159 attached - interrupt to quit
futex(0xb7658e6c, FUTEX_WAIT_PRIVATE, 2, NULL

Unfortunately, I missed to get a core dump of the master process
while it was running. Additionally, there is no debug log available,
sorry. As I was not able to reliably reproduce this issue, I'll most
probably have to wait...

It indeed looks like the master process is blocked somewhere.  It
would be interesting to see stack trace of a master process when
this happens.

(It's also good idea to make sure there are no 3rd party
modules/patches, just in case.)


Thanks for your quick reply.
I finally managed to get a core dump (I killed the master process using signal 11 in order to enforce the dump, thats why gdb claims the segfault):

Program terminated with signal 11, Segmentation fault.
#0  0xb772c430 in dl_main (phdr=0x5, phnum=1, user_entry=0x80a97f9, 
auxv=0xbfd0956c) at rtld.c:1751
1751    rtld.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  0xb772c430 in dl_main (phdr=0x5, phnum=1, user_entry=0x80a97f9, 
auxv=0xbfd0956c) at rtld.c:1751
#1  0xb7523bc6 in ?? ()
#2  0x00000005 in ?? ()
#3  0x00000001 in ?? ()
#4  0x080a97f9 in ?? ()
#5  0x0804c370 in syslog (__fmt=0x80a97f9 "%.*s", __pri=<optimized out>) at 
/usr/include/bits/syslog.h:32
#6  ngx_log_error_core (level=6, log=0x967f084, fn=0x80adba2 "ngx_signal_handler", file=0x80ad731 
"src/os/unix/ngx_process.c", line=430, err=0, fmt=0x80ad74b "signal %d (%s) received%s") 
at src/core/ngx_log.c:249
#7  0x0806b890 in ngx_signal_handler (signo=17) at src/os/unix/ngx_process.c:429
#8  0xb772c400 in dl_main (phdr=0x5, phnum=1, user_entry=0x80a97f9, 
auxv=0xbfd0a1ec) at rtld.c:1735
#9  0xb7523bc6 in ?? ()
#10 0x00000005 in ?? ()
#11 0x00000001 in ?? ()
#12 0x080a97f9 in ?? ()
#13 0x0804c370 in syslog (__fmt=0x80a97f9 "%.*s", __pri=<optimized out>) at 
/usr/include/bits/syslog.h:32
#14 ngx_log_error_core (level=6, log=0x967f084, fn=0x80adba2 "ngx_signal_handler", file=0x80ad731 
"src/os/unix/ngx_process.c", line=430, err=0, fmt=0x80ad74b "signal %d (%s) received%s") 
at src/core/ngx_log.c:249
#15 0x0806b890 in ngx_signal_handler (signo=29) at src/os/unix/ngx_process.c:429
#16 0xb772c400 in dl_main (phdr=0xbfd0b0f0, phnum=3218125184, user_entry=0x10, 
auxv=0x967f084) at rtld.c:1735
#17 0x0806f0da in ngx_master_process_cycle (cycle=0x967f078) at 
src/os/unix/ngx_process_cycle.c:169
#18 0x0804b95c in main (argc=3, argv=0xbfd0b394) at src/core/nginx.c:417
(gdb)

Maybe the the concurrently running handlers for SIGCHLD and SIGIO lead to some blocking in dl_main? However, I am not aware of the side-effects and exact purpose of the dynamic linking at this point.

And as you can see, I did not mention that I have the (semi-official?) syslog patch applied, which might indeed cause the problem when called from the signal handler. As you already pointed out, it seems to be a good idea to remove this patch and try to check whether the error persists.

Kind regards,
Florian

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to