On 2023/06/08 9:18, [email protected] wrote: > On Thu, Jun 08, 2023 at 12:03:25AM +0000, Damien Le Moal wrote: >> On 2023/06/07 23:57, Yann Sionneau wrote: >>> Hello uClibc-ng hackers, Damien, Rich, >>> >>> I am sending this email to discuss the possibility to revert >>> https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/commit/?id=08d46f1ce21e4ec51b2b1626beeaea6cbe7fdc6b >>> >>> I think this change is wrong, I have been tracking an issue for a few >>> days and I am pretty sure the issue comes from this commit. >>> >>> Basically what happens is that with this commit, tst-cancel18 test is >>> failing. >>> >>> The test is available over there: >>> https://github.com/wbx-github/uclibc-ng-test/blob/master/test/nptl/tst-cancel18.c >>> >>> The reason for this fail is that clock_nanosleep code from libc is badly >>> compiled: >>> https://elixir.bootlin.com/uclibc-ng/latest/source/librt/clock_nanosleep.c#L39 >>> >>> Because SINGLE_THREAD_P is actually defined, in case your libc build >>> supports NPTL threads (which is, I think, the majority of configs, >>> except very old systems using LINUXTHREADS or NOMMU maybe). >>> >>> In my case (kvx arch): >>> >>> #define SINGLE_THREAD_P __builtin_expect (THREAD_GETMEM (THREAD_SELF, >>> header.multiple_threads) == 0, 1) >>> >>> See: >>> https://elixir.bootlin.com/uclibc-ng/latest/source/libpthread/nptl/sysdeps/unix/sysv/linux/kvx/sysdep-cancel.h#L28 >>> >>> This results to clock_nanosleep code being compiled to a very simple >>> version that directly calls the syscall and does not change at all the >>> thread cancellability to asynchronous before calling the syscall. >>> >>> Therefore, when the test main thread calls pthread_cancel(), no signal >>> is sent (see >>> https://elixir.bootlin.com/uclibc-ng/latest/source/libpthread/nptl/pthread_cancel.c#L60) >>> >>> and therefore the nanosleep syscall is never interrupted and the test fails. >>> >>> clock_nanosleep is defined to be a mandatory cancellation point by >>> POSIX, it must be interruptible in all cases by pthread_cancel, whether >>> in synchronous or asynchronous mode. >>> >>> I think the fix for the NOMMU case should be different, maybe checking >>> for if LIBC_CANCEL_ASYNC is defined ? >>> >>> Damien do you have time to work on a new fix for the NOMMU case? Are you >>> OK with the revert? >> >> Not OK with a revert as that will bring back the compilation errors for >> RISC-V >> NOMMU case. >> >> Instead, please work on a fix for your case on top of the current code. If I >> recall correctly, the code around this is rather messy with lots of ifdefs, >> which may be why this regression was introduced. Cleaning up the code first >> to >> avoid that may help avoiding further issues in the future. >> >> Note that to test patches/regressions for risc-v NOMMU builds, you can simply >> use buildroot and run: >> >> make sipeed_maix_bit_defconfig >> make >> >> To check that it builds. > > As I understand it, the change was wrong. It's conditionally compiling > the cancellation support only if SINGLE_THREAD_P is undefined, i.e. > only if there is no threads support. > > Instead, the cancellation support should probably be compiled only if > the cancellation macros are defined. But regardless, the code as it is > now is wrong.
OK. Then let's send a fix on top of that then. Reverting will only change this problem into another. Let's address both with a fix on top instead of reverting. > > Rich > -- Damien Le Moal Western Digital Research _______________________________________________ devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
