On 2023/06/08 9:18, [email protected] wrote:
> On Thu, Jun 08, 2023 at 12:03:25AM +0000, Damien Le Moal wrote:
>> On 2023/06/07 23:57, Yann Sionneau wrote:
>>> Hello uClibc-ng hackers, Damien, Rich,
>>>
>>> I am sending this email to discuss the possibility to revert 
>>> https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/commit/?id=08d46f1ce21e4ec51b2b1626beeaea6cbe7fdc6b
>>>
>>> I think this change is wrong, I have been tracking an issue for a few 
>>> days and I am pretty sure the issue comes from this commit.
>>>
>>> Basically what happens is that with this commit, tst-cancel18 test is 
>>> failing.
>>>
>>> The test is available over there: 
>>> https://github.com/wbx-github/uclibc-ng-test/blob/master/test/nptl/tst-cancel18.c
>>>
>>> The reason for this fail is that clock_nanosleep code from libc is badly 
>>> compiled: 
>>> https://elixir.bootlin.com/uclibc-ng/latest/source/librt/clock_nanosleep.c#L39
>>>
>>> Because SINGLE_THREAD_P is actually defined, in case your libc build 
>>> supports NPTL threads (which is, I think, the majority of configs, 
>>> except very old systems using LINUXTHREADS or NOMMU maybe).
>>>
>>> In my case (kvx arch):
>>>
>>> #define SINGLE_THREAD_P __builtin_expect (THREAD_GETMEM (THREAD_SELF, 
>>> header.multiple_threads) == 0, 1)
>>>
>>> See: 
>>> https://elixir.bootlin.com/uclibc-ng/latest/source/libpthread/nptl/sysdeps/unix/sysv/linux/kvx/sysdep-cancel.h#L28
>>>
>>> This results to clock_nanosleep code being compiled to a very simple 
>>> version that directly calls the syscall and does not change at all the 
>>> thread cancellability to asynchronous before calling the syscall.
>>>
>>> Therefore, when the test main thread calls pthread_cancel(), no signal 
>>> is sent (see 
>>> https://elixir.bootlin.com/uclibc-ng/latest/source/libpthread/nptl/pthread_cancel.c#L60)
>>>  
>>> and therefore the nanosleep syscall is never interrupted and the test fails.
>>>
>>> clock_nanosleep is defined to be a mandatory cancellation point by 
>>> POSIX, it must be interruptible in all cases by pthread_cancel, whether 
>>> in synchronous or asynchronous mode.
>>>
>>> I think the fix for the NOMMU case should be different, maybe checking 
>>> for if LIBC_CANCEL_ASYNC is defined ?
>>>
>>> Damien do you have time to work on a new fix for the NOMMU case? Are you 
>>> OK with the revert?
>>
>> Not OK with a revert as that will bring back the compilation errors for 
>> RISC-V
>> NOMMU case.
>>
>> Instead, please work on a fix for your case on top of the current code. If I
>> recall correctly, the code around this is rather messy with lots of ifdefs,
>> which may be why this regression was introduced. Cleaning up the code first 
>> to
>> avoid that may help avoiding further issues in the future.
>>
>> Note that to test patches/regressions for risc-v NOMMU builds, you can simply
>> use buildroot and run:
>>
>> make sipeed_maix_bit_defconfig
>> make
>>
>> To check that it builds.
> 
> As I understand it, the change was wrong. It's conditionally compiling
> the cancellation support only if SINGLE_THREAD_P is undefined, i.e.
> only if there is no threads support.
> 
> Instead, the cancellation support should probably be compiled only if
> the cancellation macros are defined. But regardless, the code as it is
> now is wrong.

OK. Then let's send a fix on top of that then. Reverting will only change this
problem into another. Let's address both with a fix on top instead of reverting.

> 
> Rich
> 

-- 
Damien Le Moal
Western Digital Research

_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to