Package: ruby2.3 Version: 2.3.3-1+deb9u1 Severity: important Tags: upstream patch Forwarded: https://bugs.ruby-lang.org/issues/13794
Hello, After the upgrade to stretch we keep finding ruby processes (puppet agents in particular) stuck in a sched_yield busyloop. The stuck process is always a forked child of the main puppet agent running inside a timeout block. The backtrace of the process is the following: (gdb) thread apply all bt Thread 2 (Thread 0x7f2dc7904700 (LWP 11226)): #0 0x00007f2dc63bb6ad in poll () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f2dc73fba62 in timer_thread_sleep (gvl=0x5628917b3f28) at thread_pthread.c:1455 #2 thread_timer (p=0x5628917b3f28) at thread_pthread.c:1563 #3 0x00007f2dc7045494 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007f2dc63c4aff in clone () from /lib/x86_64-linux-gnu/libc.so.6 Thread 1 (Thread 0x7f2dc78fc700 (LWP 11224)): #0 0x00007f2dc63adca7 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f2dc73fbac5 in native_stop_timer_thread () at thread_pthread.c:1664 #2 rb_thread_stop_timer_thread () at thread.c:3902 #3 0x00007f2dc7341c42 in before_exec_non_async_signal_safe () at process.c:1175 #4 before_exec () at process.c:1181 #5 rb_f_exec (argc=<optimized out>, argv=<optimized out>) at process.c:2576 And the offending part of the code is this: native_stop_timer_thread(void) { int stopped; stopped = --system_working <= 0; if (TT_DEBUG) fprintf(stderr, "stop timer thread\n"); #if USE_SLEEPY_TIMER_THREAD if (stopped) { /* prevent wakeups from signal handler ASAP */ timer_thread_pipe.owner_process = 0; /* * however, the above was not enough: the FD may already be * captured and in the middle of a write while we are running, * so wait for that to finish: */ while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) { native_thread_yield(); } [..] } Thread 1 is spinning around `timer_thread_pipe.writing` because someone has erroneously bumped it to 1. (gdb) print timer_thread_pipe $1 = {normal = {3, 4}, low = {5, 6}, owner_process = 0, writing = 1} Our case seems identical to this [1] bug report. We have applied the patch [2] by Eric Wong and the problem seems resolved without causing any other problems. [1] https://bugs.ruby-lang.org/issues/13794 [2] https://80x24.org/spew/20170809232533.14932-...@80x24.org/raw Kind regards, -- Gregory Potamianos Skroutz S.A greg...@skroutz.gr -- System Information: Debian Release: 9.0 APT prefers stable-debug APT policy: (500, 'stable-debug'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-3-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages ruby2.3 depends on: ii libc6 2.24-11+deb9u1 ii libgmp10 2:6.1.2+dfsg-1 ii libruby2.3 2.3.3-1+deb9u1 ii rubygems-integration 1.11 Versions of packages ruby2.3 recommends: ii fonts-lato 2.0-1 ii libjs-jquery 3.1.1-2 ruby2.3 suggests no packages. -- no debconf information