On Tue, Mar 10, 2026 at 12:24:02PM +0100, Matthieu Baerts wrote:

> Just did. Output is available there:
> 
>   
> https://github.com/user-attachments/files/25867817/issue-617-debug-20260310.txt.gz
> 
> Only 7.7k lines this time.

Same damn thing again...

[    2.533811] virtme-n-1         3d..1. 849756us : mmcid_user_add: pid=1 
users=1 mm=000000002b3f8459
[    4.523998] virtme-n-1         3d..1. 1115085us : mmcid_user_add: pid=71 
users=2 mm=000000002b3f8459
[    4.529065] virtme-n-1         3d..1. 1115937us : mmcid_user_add: pid=72 
users=3 mm=000000002b3f8459

[    4.529448] virtme-n-71        2d..1. 1115969us : mmcid_user_add: pid=73 
users=4 mm=000000002b3f8459         <=== missing!
[    4.529946] virtme-n-71        2d..1. 1115971us : mmcid_getcid: 
mm=000000002b3f8459 cid=00000003

71 spawns 73, assigns cid 3

[    4.530573]   <idle>-0         1d..2. 1115991us : sched_switch: 
prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> 
next_comm=virtme-ng-init next_pid=73 next_prio=120
[    4.530865]   <idle>-0         1d..2. 1115993us : mmcid_cpu_update: cpu=1 
cid=00000003 mm=000000002b3f8459

It gets scheduled on CPU-1, sets CID...

[    4.531038] virtme-n-1         3d..1. 1116013us : mmcid_user_add: pid=74 
users=5 mm=000000002b3f8459

Then 1 spawns 74 on CPU 3, this is the 5th task, so we initiate a
task->cpu cid transition:

[    4.531203] virtme-n-1         3d..1. 1116014us : mmcid_task_update: pid=1 
cid=20000000 mm=000000002b3f8459
[    4.531369] virtme-n-1         3d..1. 1116014us : mmcid_cpu_update: cpu=3 
cid=20000000 mm=000000002b3f8459

Task 1

[    4.531530] virtme-n-1         3..... 1116014us : mmcid_fixup_task: pid=71 
cid=00000001 active=1 users=4 mm=000000002b3f8459
[    4.531790] virtme-n-1         3d..2. 1116015us : mmcid_task_update: pid=71 
cid=80000000 mm=000000002b3f8459
[    4.532000] virtme-n-1         3d..2. 1116015us : mmcid_putcid: 
mm=000000002b3f8459 cid=00000001

Task 71

[    4.532169] virtme-n-1         3..... 1116015us : mmcid_fixup_task: pid=72 
cid=00000002 active=1 users=3 mm=000000002b3f8459
[    4.532362] virtme-n-1         3d..2. 1116016us : mmcid_task_update: pid=72 
cid=20000002 mm=000000002b3f8459
[    4.532514] virtme-n-1         3d..2. 1116016us : mmcid_cpu_update: cpu=0 
cid=20000002 mm=000000002b3f8459

Task 72

[    4.532649] virtme-n-1         3..... 1116016us : mmcid_fixup_task: pid=74 
cid=80000000 active=1 users=2 mm=000000002b3f8459

Task 74, note the glaring lack of 73!!! which all this time is running
on CPU 1. Per the fact that it got scheduled it must be on tasklist,
per the fact that 1 spawns 74 after it on CPU3, we must observe any
prior tasklist changes and per the fact that it got a cid ->active must
be set. WTF!

That said, we set active after tasklist_lock now, so it might be
possible we simply miss that store, observe the 'old' 0 and skip over
it?

Let me stare hard at that...


[    4.532912] virtme-n-1         3..... 1116017us : mmcid_fixup_task: pid=71 
cid=80000000 active=1 users=1 mm=000000002b3f8459
[    4.533386] virtme-n-1         3d..2. 1116041us : mmcid_cpu_update: cpu=3 
cid=40000000 mm=000000002b3f8459

I *think* this is the for_each_process_thread() hitting 71 again.

[    4.533805]   <idle>-0         2d..2. 1116043us : mmcid_getcid: 
mm=000000002b3f8459 cid=00000001
[    4.533980]   <idle>-0         2d..2. 1116044us : mmcid_cpu_update: cpu=2 
cid=40000001 mm=000000002b3f8459
[    4.534156]   <idle>-0         2d..2. 1116044us : mmcid_task_update: pid=74 
cid=40000001 mm=000000002b3f8459
[    4.534579] virtme-n-72        0d..2. 1116046us : mmcid_cpu_update: cpu=0 
cid=40000002 mm=000000002b3f8459

[    4.535803] virtme-n-73        1d..2. 1116179us : sched_switch: 
prev_comm=virtme-ng-init prev_pid=73 prev_prio=120 prev_state=S ==> 
next_comm=swapper/1 next_pid=0 next_prio=120

And then after all that, 73 blocks.. not having been marked TRANSIT or
anything and thus holding on to the CID, leading to all this trouble.




Reply via email to