Re: [PATCH v2 1/2] tracing: Add task_prctl_unknown tracepoint

Marco Elver Thu, 07 Nov 2024 07:59:12 -0800

On Thu, 7 Nov 2024 at 16:54, Mathieu Desnoyers
<mathieu.desnoy...@efficios.com> wrote:
>
> On 2024-11-07 10:46, Marco Elver wrote:
> > On Thu, 7 Nov 2024 at 16:45, Mathieu Desnoyers
> > <mathieu.desnoy...@efficios.com> wrote:
> >>
> >> On 2024-11-07 07:25, Marco Elver wrote:
> >>> prctl() is a complex syscall which multiplexes its functionality based
> >>> on a large set of PR_* options. Currently we count 64 such options. The
> >>> return value of unknown options is -EINVAL, and doesn't distinguish from
> >>> known options that were passed invalid args that also return -EINVAL.
> >>>
> >>> To understand if programs are attempting to use prctl() options not yet
> >>> available on the running kernel, provide the task_prctl_unknown
> >>> tracepoint.
> >>>
> >>> Note, this tracepoint is in an unlikely cold path, and would therefore
> >>> be suitable for continuous monitoring (e.g. via perf_event_open).
> >>>
> >>> While the above is likely the simplest usecase, additionally this
> >>> tracepoint can help unlock some testing scenarios (where probing
> >>> sys_enter or sys_exit causes undesirable performance overheads):
> >>>
> >>>     a. unprivileged triggering of a test module: test modules may 
> >>> register a
> >>>        probe to be called back on task_prctl_unknown, and pick a very 
> >>> large
> >>>        unknown prctl() option upon which they perform a test function for 
> >>> an
> >>>        unprivileged user;
> >>>
> >>>     b. unprivileged triggering of an eBPF program function: similar
> >>>        as idea (a).
> >>>
> >>> Example trace_pipe output:
> >>>
> >>>     test-484     [000] .....   631.748104: task_prctl_unknown: comm=test 
> >>> option=1234 arg2=101 arg3=102 arg4=103 arg5=104
> >>>
> >>
> >> My concern is that we start adding tons of special-case
> >> tracepoints to the implementation of system calls which
> >> are redundant with the sys_enter/exit tracepoints.
> >>
> >> Why favor this approach rather than hooking on sys_enter/exit ?
> >
> > It's __extremely__ expensive when deployed at scale. See note in
> > commit description above.
>
> I suspect you base the overhead analysis on the x86-64 implementation
> of sys_enter/exit tracepoint and especially the overhead caused by
> the SYSCALL_WORK_SYSCALL_TRACEPOINT thread flag, am I correct ?
>
> If that is causing a too large overhead, we should investigate if
> those can be improved instead of adding tracepoints in the
> implementation of system calls.


Doing that may be generally useful, but even if you improve it
somehow, there's always some additional bit of work needed on
sys_enter/exit as soon as a tracepoint is attached. Even if that's
just a few cycles, it's too much (for me at least).

Also: if you just hook sys_enter/exit, you don't know if the prctl was
handled or not by inspecting the return code (-EINVAL). I want the
kernel to tell me if it handled the prctl() or not, and I also think
it's very bad design to copy-paste the prctl() option checking of the
running kernel in a sys_enter/exit hook. This doesn't scale in terms
of performance nor maintainability.

Re: [PATCH v2 1/2] tracing: Add task_prctl_unknown tracepoint

Reply via email to