On Tue, Mar 10, 2026 at 12:24:02PM +0100, Matthieu Baerts wrote: > Just did. Output is available there: > > > https://github.com/user-attachments/files/25867817/issue-617-debug-20260310.txt.gz > > Only 7.7k lines this time.
Same damn thing again... [ 2.533811] virtme-n-1 3d..1. 849756us : mmcid_user_add: pid=1 users=1 mm=000000002b3f8459 [ 4.523998] virtme-n-1 3d..1. 1115085us : mmcid_user_add: pid=71 users=2 mm=000000002b3f8459 [ 4.529065] virtme-n-1 3d..1. 1115937us : mmcid_user_add: pid=72 users=3 mm=000000002b3f8459 [ 4.529448] virtme-n-71 2d..1. 1115969us : mmcid_user_add: pid=73 users=4 mm=000000002b3f8459 <=== missing! [ 4.529946] virtme-n-71 2d..1. 1115971us : mmcid_getcid: mm=000000002b3f8459 cid=00000003 71 spawns 73, assigns cid 3 [ 4.530573] <idle>-0 1d..2. 1115991us : sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=virtme-ng-init next_pid=73 next_prio=120 [ 4.530865] <idle>-0 1d..2. 1115993us : mmcid_cpu_update: cpu=1 cid=00000003 mm=000000002b3f8459 It gets scheduled on CPU-1, sets CID... [ 4.531038] virtme-n-1 3d..1. 1116013us : mmcid_user_add: pid=74 users=5 mm=000000002b3f8459 Then 1 spawns 74 on CPU 3, this is the 5th task, so we initiate a task->cpu cid transition: [ 4.531203] virtme-n-1 3d..1. 1116014us : mmcid_task_update: pid=1 cid=20000000 mm=000000002b3f8459 [ 4.531369] virtme-n-1 3d..1. 1116014us : mmcid_cpu_update: cpu=3 cid=20000000 mm=000000002b3f8459 Task 1 [ 4.531530] virtme-n-1 3..... 1116014us : mmcid_fixup_task: pid=71 cid=00000001 active=1 users=4 mm=000000002b3f8459 [ 4.531790] virtme-n-1 3d..2. 1116015us : mmcid_task_update: pid=71 cid=80000000 mm=000000002b3f8459 [ 4.532000] virtme-n-1 3d..2. 1116015us : mmcid_putcid: mm=000000002b3f8459 cid=00000001 Task 71 [ 4.532169] virtme-n-1 3..... 1116015us : mmcid_fixup_task: pid=72 cid=00000002 active=1 users=3 mm=000000002b3f8459 [ 4.532362] virtme-n-1 3d..2. 1116016us : mmcid_task_update: pid=72 cid=20000002 mm=000000002b3f8459 [ 4.532514] virtme-n-1 3d..2. 1116016us : mmcid_cpu_update: cpu=0 cid=20000002 mm=000000002b3f8459 Task 72 [ 4.532649] virtme-n-1 3..... 1116016us : mmcid_fixup_task: pid=74 cid=80000000 active=1 users=2 mm=000000002b3f8459 Task 74, note the glaring lack of 73!!! which all this time is running on CPU 1. Per the fact that it got scheduled it must be on tasklist, per the fact that 1 spawns 74 after it on CPU3, we must observe any prior tasklist changes and per the fact that it got a cid ->active must be set. WTF! That said, we set active after tasklist_lock now, so it might be possible we simply miss that store, observe the 'old' 0 and skip over it? Let me stare hard at that... [ 4.532912] virtme-n-1 3..... 1116017us : mmcid_fixup_task: pid=71 cid=80000000 active=1 users=1 mm=000000002b3f8459 [ 4.533386] virtme-n-1 3d..2. 1116041us : mmcid_cpu_update: cpu=3 cid=40000000 mm=000000002b3f8459 I *think* this is the for_each_process_thread() hitting 71 again. [ 4.533805] <idle>-0 2d..2. 1116043us : mmcid_getcid: mm=000000002b3f8459 cid=00000001 [ 4.533980] <idle>-0 2d..2. 1116044us : mmcid_cpu_update: cpu=2 cid=40000001 mm=000000002b3f8459 [ 4.534156] <idle>-0 2d..2. 1116044us : mmcid_task_update: pid=74 cid=40000001 mm=000000002b3f8459 [ 4.534579] virtme-n-72 0d..2. 1116046us : mmcid_cpu_update: cpu=0 cid=40000002 mm=000000002b3f8459 [ 4.535803] virtme-n-73 1d..2. 1116179us : sched_switch: prev_comm=virtme-ng-init prev_pid=73 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120 And then after all that, 73 blocks.. not having been marked TRANSIT or anything and thus holding on to the CID, leading to all this trouble.

