On Thu, 2026-06-18 at 18:04 +0800, Jun Miao wrote:
> During early boot, ksgxd (Intel Software Guard Extensions Kernel Thread)
> iterates over all post-kexec dirty EPC pages in a tight loop calling
> cond_resched() after each page. But, on isolated CPUs
> (a common configuration in cloud VMs), cond_resched() never triggers a
> real context switch because TIF_NEED_RESCHED is not set when no competing
> runnable task exists on that CPU.
>
> synchronize_rcu_tasks(), invoked by BPF LSM during initialization, must
> wait for every task that was running at the start of the grace period to
> pass through a quiescent state (a voluntary sleep or preemption point).
> If ksgxd never leaves the CPU, the rcu_tasks grace period stalls, causing
> boot delays exceeding 60 seconds on machines with large EPC regions.
>
> Fix this by introducing SGX_SANITIZE_RESCHED_INTERVAL (32768) and forcing
> ksgxd to sleep for one jiffy every that many pages, guaranteeing that an
> rcu_tasks quiescent state is reached in bounded time regardless of CPU
> isolation. Keep cond_resched() for all other iterations.
>
> Without this patch, instead, virtual machines (VMs) experience a long OS boot
> times:
This seems a common problem at the scheduler and RCU layer, but not specific to
SGX. I originally thought cond_resched() should be able to handle this, but
after reading some history it is not the case.
IIUC, cond_resched_tasks_rcu_qs() was specifically introduced to to address this
issue. See bde6c3aa9930 ("rcu: Provide cond_resched_rcu_qs() to force quiescent
states in long loops").
Could you try this solution?