On Thu, Sep 26, 2019 at 05:06:51PM +0200, Yann Sionneau wrote: > Hello, > > I would like to know if there is consensus on the fact that this > program hanging on a deadlock is a uClibc-ng bug: > https://pastebin.com/11qLsTW5 > > I've asked people to try this on several arch/libc combination, it > works well (no hang) on: > > * x86_64/glibc > > * x86_64/musl > > * armv7l/musl > > Also, I've run the puts.c test (linked above) on 3 archs with > uClibc-ng, it fails (hangs) on: > > * or1k/uClibc-ng > > * mips32r6/uClibc-ng > > * k1c/uClibc-ng (port not completely published yet) > > See failure logs + strace -f of what happens when hanging: > https://mypads.framapad.org/mypads/?/mypads/group/uclibc-ng-10be37ap/pad/view/puts-issue-qt62e74n > > > My understanding of the issue is that: > > puts, and possibly other libc functions, are taking a lock ( > https://elixir.bootlin.com/uclibc-ng/latest/source/libc/stdio/puts.c#L17 > ) and end up calling write() which is a cancellation point. ( > https://elixir.bootlin.com/uclibc-ng/latest/source/libc/stdio/_stdio.h#L150 > ) > > So, if a thread is canceled, is asynchronous mode (which is the > default one), and the cancelation is triggered by the write() inside > the puts(), then the thread will unwind and exit without unlocking > the puts lock. > > Then, any other thread calling puts() will hang indefinetely (and > hang other threads if it hangs with locks held...). > > My understanding of what can be done to fix this issue: > > 1/ Either make puts a non cancelation point (see man 7 pthreads, > puts is not listed in mandatory cancelation point, only in "may"). > For instance it is not a cancelation point in glibc. > > 2/ Or keep puts as a cancelation point and fix the puts code so that > it releases the lock upon cancelation (using > pthread_cleanup_push/pop for instance) > > In case people think this is indeed a bug, here are examples of code > fixes that I have in mind, please don't hesitate to comment or/and > propose something else: > > 1/ https://pastebin.com/ePsWJzdi > > 2/ https://pastebin.com/5EA4RedS > > > One problem of those fixes is that we need to identify all libc > functions that take a lock and call a cancellable function and apply > such kind of fixes... This is not easy and a bit painful.
I think your analysis is correct here. On top of that, though, uclibc has the broken, inherently-racy cancellation implementation inherited from glibc. See: - https://sourceware.org/bugzilla/show_bug.cgi?id=12683 - https://ewontfix.com/2/ - https://ewontfix.com/4/ - https://ewontfix.com/16/ As such, even if you fix the above bug, it will be unsafe to use it, and critically unsafe unless you block cancellation around at least resource-freeing operations like close. I'd go so far as to say that, if uclibc can't fix this, it should ignore cancellation points which are resource-freeing operations (close, maybe a few others). Resource-allocating ones are also problematic but "just" resource leaks at worst; maybe cancellation should be ignored for them too. Rich _______________________________________________ devel mailing list [email protected] https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
