patacongo edited a comment on issue #3868: URL: https://github.com/apache/incubator-nuttx/issues/3868#issuecomment-856917238
it does look like the main thread has exited or that the tg_pid is not valid for some reason. The tg_pid is set when the task group is create and is never changed until the task group is de-allocated. So until there is a wild write, I would expect the tg_pid to be valid. The more likely thing is that the main thread has exited. That is really very likely under circumstances. pthreads are not really canceled on exit() in deferred cancellation mode: They may only be marked as deleted and will continue to run until the thread calls a cancellation point. That could be some time later. Meanwhile, the main thread could exit couldn't it. 1. Main thread calls exit() 2. All pthreads are canceled, but one or more persists because they are in deferred cancellation mode 3. pthread exits and attempts to call data destructors 4. That fails and assertion occurs because the main thread has already exited. Or if a pthread is waiting on a semaphore and cancelled, but the OS logic ignores the ECANCELED error, then the pthread will never exit will never exit. In either case, in these states tg_pid will be valid, but nxsched_get_tcb() will fail. Perhaps there is some kind of protection that I do not see, but I think this could happen in normal operation, couldn't it? If so, then one solution would be to remove the assertion. I suspect that there are other possibilities for race conditions. So example, I think this could cause a similar problem: What happens when a pthread calls exit()? In that case, it looks like the main thread could be killed before that pthread is killed. group_kill_children() will kill the main thread and the pthread will be the last thread to exit (via pthread_exit). Then when that pthread calls its destructors, the main thread would not exist the assertion would occur. 1. pthread exits 2. All threads killed or cancelled except for the calling pthread 3. Calling pthread exits and attempts to call data destructors 4. Assertion occurs because the main thread was killed in step 2. I am just speculating about ways that the ordering could change to to race conditions. I don't know if any of the above are truly possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
