On 2025-07-02 15:21, Steven Rostedt wrote:
On Wed, 2 Jul 2025 15:12:45 -0400
Mathieu Desnoyers <mathieu.desnoy...@efficios.com> wrote:

But you are missing one more thing that the trace can use, and that's
the time sequence. As soon as the same thread has a new id you can
assume all the older user space traces are not applicable for any new
events for that thread, or any other thread with the same thread ID.

In order for the scheme you describe to work, you need:

- instrumentation of task lifetime (exit/fork+clone),
- be sure that the events related to that instrumentation were not
    dropped.

I'm not sure about ftrace, but in LTTng enabling instrumentation of
task lifetime is entirely up to the user.

Has nothing to do with task lifetime. If you see a deferred request
with id of 1 from task 8888, and then later you see either a deferred
request or a stack trace with an id other than 1 for task 8888, you can
then say all events before now are no longer eligible for new deferred
stack traces.


And even if it's enabled, events can be discarded (e.g. buffer full).

The only case is if you see a deferred request with id 1 for task 8888,
then you start dropping all events and that task 8888 exits and a new
one appears with task id 8888 where it too has a deferred request with
id 1 then you start picking up events again and see a deferred stack
trace for the new task 8888 where it's id is 1, you lose.

But other than that exact scenario, it should not get confused.

Correct.




Thus the only issue that can truly be a problem is if you have missed
events where thread id wraps around. I guess that could be possible if
a long running task finally exits and it's thread id is reused
immediately. Is that a common occurrence?

You just need a combination of thread ID re-use and either no
instrumentation of task lifetime or events discarded to trigger this.

Again, it's seeing a new request with another id for the same task, you
don't need to worry about it. You don't even need to look at fork and
exit events.

The reason why instrumentation of exit/{fork,clone} is useful is to
know when a thread ID is re-used.


Even if it's not so frequent, at large scale and in production, I
suspect that this will happen quite often.

Really? As I explained above?

Note that all newly forked threads will likely start counting near 0.
So chances are that for short-lived threads most of the counter values
will be in a low range.

So all you need is thread ID re-use for two threads which happen to use
the deferred cookies within low-value ranges to hit this.

From my perspective, making trace analysis results reliable is the most
basic guarantee tooling should provide in order to make it trusted by
users. So I am tempted to err towards robustness rather than take
shortcuts because "it does not happen often".

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Reply via email to