On Sun, Apr 23, 2023 at 10:22 AM Gregory Nutt <spudan...@gmail.com> wrote: > These are internal OS interfaces. In NuttX, sched_lock() is exported as > a standard application interface. That, I think is wrong. An > application should not require such an interface, there is no precedent > in any standard for application control of the pre-emption state, and, > in fact, I think it is a security issue that we permit this. Does it > really make since that any application should be able to just stop the > scheduler???? > > There are many uses of sched_lock() in apps/ now:
(snip) > It might be possible to re-design some of these to avoid disabling > pre-emption. I am not sure. We would probably lose some of them and > the re-design of the others could be a very complicated job. > > OStest, for example, requires sched_lock() to control order of execution > of threads to set up test cases. Without sched_lock(), we could not > test the OS from user space. OStest has historically used internal OS > interface in order to implement semi-white-box testing. Possibly stupid idea: Maybe OStest needs to become part of the OS and not part of apps? > My position is: > > * We have to keep sched_lock() as an OS internal interface and retain > its current behavior. > * Re-naming the interface, documenting the interface, doing whatever > we can to assure proper usage is fine. > * I have mixed feelings about removing user access to sched_lock(). I > think it is the correct thing to do, but I don't know if I would > like the consequences of that decision. > > NOTE: sched_lock() was documented in the past, but that documentation > was removed from the repository and from general access for some reason: > https://cwiki.apache.org/confluence/display/NUTTX/User+Guide#schedlock >From that page: "Description: This function disables context switching by Disabling addition of new tasks to the ready-to-run task list. The task that calls this function will be the only task that is allowed to run until it either calls sched_unlock (the appropriate number of times) or until it blocks itself." I would suggest to add something like: "Interrupts are not disabled, allowing real-time interrupt response to take place." Also on that page: "POSIX Compatibility: This is a NON-POSIX interface. VxWorks provides the comparable interface: STATUS taskLock(void);" I would suggest to add other known similar interfaces that Greg just mentioned: > * Linux has pre-empt disable: > https://www.kernel.org/doc/Documentation/preempt-locking.txt, > > https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/preempt.c#L58 > * Zephyr has k_sched_lock(): > > https://docs.zephyrproject.org/3.0.0/reference/kernel/scheduling/index.html As we learn of and confirm additional ones from other OSes, we can continue to add them. Gradually, over time, I recommend to add such documentation to other interfaces as well. That could help people migrating from other RTOSes to NuttX. The file sched/sched/sched_lock.c contains a rather good explanation IMO: /* Pre-emption is disabled via the interface sched_lock(). sched_lock() * works by preventing context switches from the currently executing tasks. * This prevents other tasks from running (without disabling interrupts) and * gives the currently executing task exclusive access to the (single) CPU * resources. Thus, sched_lock() and its companion, sched_unlock(), are * used to implement some critical sections. * * In the single CPU case, pre-emption is disabled using a simple lockcount * in the TCB. When the scheduling is locked, the lockcount is incremented; * when the scheduler is unlocked, the lockcount is decremented. If the * lockcount for the task at the head of the g_readytorun list has a * lockcount > 0, then pre-emption is disabled. * * No special protection is required since only the executing task can * modify its lockcount. */ #ifdef CONFIG_SMP /* In the multiple CPU, SMP case, disabling context switches will not give a * task exclusive access to the (multiple) CPU resources (at least without * stopping the other CPUs): Even though pre-emption is disabled, other * threads will still be executing on the other CPUS. * * There are additional rules for this multi-CPU case: * * 1. There is a global lock count 'g_cpu_lockset' that includes a bit for * each CPU: If the bit is '1', then the corresponding CPU has the * scheduler locked; if '0', then the CPU does not have the scheduler * locked. * 2. Scheduling logic would set the bit associated with the cpu in * 'g_cpu_lockset' when the TCB at the head of the g_assignedtasks[cpu] * list transitions has 'lockcount' > 0. This might happen when * sched_lock() is called, or after a context switch that changes the * TCB at the head of the g_assignedtasks[cpu] list. * 3. Similarly, the cpu bit in the global 'g_cpu_lockset' would be cleared * when the TCB at the head of the g_assignedtasks[cpu] list has * 'lockcount' == 0. This might happen when sched_unlock() is called, or * after a context switch that changes the TCB at the head of the * g_assignedtasks[cpu] list. * 4. Modification of the global 'g_cpu_lockset' must be protected by a * spinlock, 'g_cpu_schedlock'. That spinlock would be taken when * sched_lock() is called, and released when sched_unlock() is called. * This assures that the scheduler does enforce the critical section. * NOTE: Because of this spinlock, there should never be more than one * bit set in 'g_cpu_lockset'; attempts to set additional bits should * cause the CPU to block on the spinlock. However, additional bits * could get set in 'g_cpu_lockset' due to the context switches on the * various CPUs. * 5. Each time the head of a g_assignedtasks[] list changes and the * scheduler modifies 'g_cpu_lockset', it must also set 'g_cpu_schedlock' * depending on the new state of 'g_cpu_lockset'. * 5. Logic that currently uses the currently running tasks lockcount * instead uses the global 'g_cpu_schedlock'. A value of SP_UNLOCKED * means that no CPU has pre-emption disabled; SP_LOCKED means that at * least one CPU has pre-emption disabled. */ If anyone can point out what is missing/wrong in the above, that would be helpful. The function's docstring could be expanded a little bit to explain that interrupts are not disabled. Currently it does not say that: /**************************************************************************** * Name: sched_lock * * Description: * This function disables context switching by disabling addition of * new tasks to the g_readytorun task list. The task that calls this * function will be the only task that is allowed to run until it * either calls sched_unlock() (the appropriate number of times) or * until it blocks itself. * * Input Parameters: * None * * Returned Value: * OK on success; ERROR on failure * ****************************************************************************/ Did we ever have a document that explains when to call sched_lock() vs when to call enter_critical_section()? If not, that could be useful. Most of what was discussed in this thread can be cleaned up a little bit and made into that documentation. Cheers, Nathan