Sync using nxsched_foreach might cause long critical sections / deadlocks

Carlos Sanchez Wed, 07 May 2025 07:12:30 -0700

Hi all,

We have found a problem with using sync in our platform.


sync() uses nxsched_foreach() to traverse the list of tasks and call
file_fsync on each; that call might cause a block write.
Such block writes can be long, in our case (AT45DB641E) can be up to
5ms. nxsched_foreach uses a critical section, that means 5ms with no
IRQ handling which is fatal for (among other things) our UART
communication.
This is our current problem, but to make things worse, I see upstream
changes on recent Nuttx releases (we are using 12.1.0, planning on
upgrading) the scheduler is also locked in nxsched_foreach. While I
understand the reason for this, it is also true that a device
operation is called with a locked scheduler; that device write might
require using worker threads (our filesystem driver does) so this will
effectively cause a deadlock.

We are not sure the best way to solve this while keeping sync()
semantics. POSIX does not require sync() to wait for actual physical
storage, but it *does* require fsync() to wait, and it makes sense to
implement sync() using fsync(). Maybe gathering the list of open files
first then calling file_fsync() on that list (with some checking
and/or locking each task file list) outside of the critical section?
But still there might be open file changes or even task changes after
the list has been gathered. IMHO this is the best way forward but it
is not so straightforward.

We would like to know what others think to make our fix/workaround in
a way compatible with future changes.

BR

Carlos

-- 

Carlos Sanchez (he, him, his)
Geotab

Senior Team Lead, Embedded Development | Europe

Visit

www.geotab.com

Twitter | Facebook | YouTube | LinkedIn

Sync using nxsched_foreach might cause long critical sections / deadlocks

Reply via email to