Sashiko raised a question about pidfd_get_task() and PIDFD_THREAD [1],
so I ran some tests to understand the behavior.
[1] 
https://sashiko.dev/#/patchset/[email protected]

pidfd_get_task() always resolves pidfds using PIDTYPE_TGID (kernel/pid.c
line 640), regardless of whether the pidfd was created with PIDFD_THREAD.
This means:

 - A PIDFD_THREAD pidfd for a non-leader thread fails with ESRCH.
 - A regular pidfd for a process whose leader has exited (pthread_exit
   in main, secondary thread still alive) also fails with ESRCH.

This is not specific to my patch: process_madvise() uses pidfd_get_task()
in the same way and has the same behavior. I wrote a test program
confirming this:

  
https://github.com/alban/tests/tree/alban_pvm_flags/pvm_flags/pidfd_thread_test

Results summary:

  All threads alive:
    pidfd_open(pid, 0)              + process_vm_readv: OK
    pidfd_open(tid, PIDFD_THREAD)   + process_vm_readv: OK (leader tid)
    pidfd_open(tid, PIDFD_THREAD)   + process_vm_readv: ESRCH (non-leader)

  Leader thread exited (secondary still alive):
    pidfd_open(pid, 0)              + process_vm_readv: ESRCH
    pidfd_open(pid, PIDFD_THREAD)   + process_vm_readv: ESRCH
    pidfd_open(tid, PIDFD_THREAD)   + process_vm_readv: ESRCH (non-leader)
    process_vm_readv(tid, flags=0)                    : OK (plain TID path)

  process_madvise() behaves identically in all cases above.

For the non-leader thread case when all threads are alive, this is fine in
practice: all threads share the same mm_struct, so profilers just use a regular
pidfd for the thread-group leader.

However, the exited-leader case is a real limitation for profilers.
OpenTelemetry eBPF Profiler wants to profile a process where the main thread
has exited but secondary threads are still running [2].
[2] https://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/376

Using plain TIDs (flags=0) would work, but it means users cannot use
PROCESS_VM_PIDFD in this scenario.

What do you think this patch should do? I see two options:
 - Address this limitation in a separate future patch that fixes
   pidfd_get_task() to use PIDTYPE_PID when PIDFD_THREAD is detected in
   f_flags, benefiting all callers (process_vm_readv, process_madvise,
   and any future users).
 - Address it in this patch series.

Reply via email to