Hello! On Mon, Dec 02, 2013 at 07:15:52AM -0500, itpp2012 wrote:
> Here is a patch for a possible mutex starvation issue which affects all > nginx Linux versions. > Already solved for Windows since nginx 1.5.7.1 Caterpillar. > Can be reproduced when nginx reloads the config & worker holding mutex dies > or hangs. > > Fixed by Vittorio Francesco Digilio, commercially sponsered solution by > ITPP. > target source mainline 1.5.8 - 30-11-2013. > src/event/ngx_event_accept.c, line 402 was correctly found (starvation fix) > but missed in: > src/event/ngx_event.c, line 259-260, 1 line added; > 255 if (ngx_posted_accept_events) { > 256 ngx_event_process_posted(cycle, &ngx_posted_accept_events); > 257 } > 258 > 259 if (ngx_accept_mutex_held) { > --+ ngx_accept_mutex_held=0; > 260 ngx_shmtx_unlock(&ngx_accept_mutex); > 261 } > 262 > 263 if (delta) { > 264 ngx_event_expire_timers(); > 265 } This patch is wrong. The ngx_accept_mutex_held don't need to be unset here. The ngx_accept_mutex_held is used as an idicator that on previous iteration the lock was held by the process, and unsetting it will result in incorrect behaviour. > Also applies to; src/os/win32/ngx_process_cycle.c > line 507-508, 3 lines added; > 504 if (ngx_processes[n].handle != h) { > 505 continue; > 506 } > 507 > --+ if(*ngx_accept_mutex.lock==ngx_processes[n].pid) { > --+ *ngx_accept_mutex.lock=0; > --+ } > 508 if (GetExitCodeProcess(h, &code) == 0) { > 509 ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno, > 510 "GetExitCodeProcess(%P) failed", > 511 ngx_processes[n].pid); > 512 } This patch is also wrong. In the current state of the win32 version it is not assumed that accept mutex is used at all. But if it is, correct aproach to unlock shared memory mutexes on abnormal process termination is to port (or move to platform-independed place) the ngx_unlock_mutexes() function from src/os/unix/ngx_process.c. It correctly uses ngx_shmtx_force_unlock() to properly unlock shared memory mutexes using atomic operations. Note well that unlocking shared memory mutexes on abnormal process termination is an emergency mechanism. If it actually happens, it indicate that something is really wrong in other places of the system. Please also consider reading http://nginx.org/en/docs/contributing_changes.html. -- Maxim Dounin http://nginx.org/en/donation.html _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx