Hi Kai,
>Some nits in spell/grammar below.  Sorry I didn't go through any spell/grammar
>check last night since it was already very late for me.
>
>Btw, the v4 is quicker than I expected anyway.  We are still in merge window so
>maintainers are not expected to review this.  For more info, see "Merge window"
>section in Documentation/process/maintainer-tip.rst.
>
>On Wed, 2026-06-24 at 22:20 +0800, Jun Miao wrote:
>> The kernel resets all EPC pages to a clean state in a loop before
>> using them for enclaves.  The number of EPC pages could be large
>> (e.g., GBs) thus resetting them could take a fair amount of time.
>> Because of that, during
>
>s/fair/significant
>
>> early boot, the kernel resets EPC pages through a kernel thread
>> ksgxd() and there's a cond_resched() after resetting each EPC page.
>>
>> This is fine in most cases, but becomes a problem when there's other
>> kernel code waiting for RCU-Tasks grace period but the cond_resched()
>> in ksgxd() never triggers rescheduling.  Because cond_resched()
>> doesn't report quiescent state when it doesn't trigger rescheduling,
>> the thread that is waiting for RCU-Tasks grace period will need to wait 
>> until all
>EPC pages are reset.
>>
>> For instance, BPF LSM subsystem can invoke synchronize_rcu_tasks() at
>> kernel boot time.  A VM with a large EPC assigned and have BPF LSM
>> enabled can take
>                                                ^
>Remove the 'have'.
>
>> a long time to boot, with a call trace triggered:
>>
>>     rcu_tasks_wait_gp: rcu_tasks grace period number 1 (since boot) is 130631
>jiffies old.
>>     INFO: task systemd:1 blocked for more than 122 seconds.
>>     ...
>>     task:systemd  state:D stack:0  pid:1  tpid:1  ppid:0  flags:0x00000002
>>     Call Trace:
>>     ...
>>     schedule_timeout+0x157/0x170
>>     wait_for_completion+0x88/0x150
>>     __wait_rcu_gp+0x17e/0x190
>>     synchronize_rcu_tasks_generic+0x64/0x60
>>     ...
>>     synchronize_rcu_tasks+0x15/0x20
>>     register_ftrace_direct+0x31f/0x350
>>     ...
>>     bpf_trampoline_link_prog+0x33/0x60
>>     bpf_tracing_prog_attach+0x3c5/0x5f0
>>
>> Replace cond_resched() with cond_resched_tasks_rcu_qs() which
>> explicitly report quiescent
>
>s/report quiescent/reports quiescent state
>
>> regardless whether actual rescheduling is triggered.  Resetting all
>> EPC pages in ksgxd()
>
>s/regardless/regardless of
>
>> isn't performance critical so the extra cost of cond_resched_tasks_rcu_qs() 
>> isn't
>a problem.
>>
>> Tests showed this reduced the VM kernel boot time from ~50s to ~700ms.
>>
>> Reported-by: Challvy Tee <[email protected]>
>> Link: https://github.com/systemd/systemd/issues/40423
>> Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page
>> Cache (EPC) sections")
>> Tested-by: Challvy Tee <[email protected]>
>> Suggested-by: Kai Huang <[email protected]>
>> Co-developed-by: Fan Du <[email protected]>
>> Signed-off-by: Fan Du <[email protected]>
>> Signed-off-by: Jun Miao <[email protected]>
>
>
>Btw, English isn't my first language, so feel free to use AI to improve the 
>above
>changelog if you want (I just did but not sure the changes are all better 
>though).

Thank you for your detailed guidance. I was still a bit careless.
Got it all your review, I will review commit log through AI again before push 
V5 to make it more perfect.

>Anyway, feel free to add:
>
>Reviewed-by: Kai Huang <[email protected]>

It is my pleasure :-)

---
Warm regards
Jun Miao


Reply via email to