On Mon, 24 Jun 2024 17:21:36 -0700
Andrii Nakryiko <[email protected]> wrote:
> Anyways, under exclusive writer lock, we double-check that refcount
> didn't change and is still zero. If it is, we proceed with destruction,
> because at that point we have a guarantee that find_active_uprobe()
> can't successfully look up this uprobe instance, as it's going to be
> removed in destructor under writer lock. If, on the other hand,
> find_active_uprobe() managed to bump refcount from zero to one in
> between put_uprobe()'s atomic_dec_and_test(&uprobe->ref) and
> write_lock(&uprobes_treelock), we'll deterministically detect this with
> extra atomic_read(&uprobe->ref) check, and if it doesn't hold, we
> pretend like atomic_dec_and_test() never returned true. There is no
> resource freeing or any other irreversible action taken up till this
> point, so we just exit early.
>
> One tricky part in the above is actually two CPUs racing and dropping
> refcnt to zero, and then attempting to free resources. This can happen
> as follows:
> - CPU #0 drops refcnt from 1 to 0, and proceeds to grab uprobes_treelock;
> - before CPU #0 grabs a lock, CPU #1 updates refcnt as 0 -> 1 -> 0, at
> which point it decides that it needs to free uprobe as well.
>
> At this point both CPU #0 and CPU #1 will believe they need to destroy
> uprobe, which is obviously wrong. To prevent this situations, we augment
> refcount with epoch counter, which is always incremented by 1 on either
> get or put operation. This allows those two CPUs above to disambiguate
> who should actually free uprobe (it's the CPU #1, because it has
> up-to-date epoch). See comments in the code and note the specific values
> of UPROBE_REFCNT_GET and UPROBE_REFCNT_PUT constants. Keep in mind that
> a single atomi64_t is actually a two sort-of-independent 32-bit counters
> that are incremented/decremented with a single atomic_add_and_return()
> operation. Note also a small and extremely rare (and thus having no
> effect on performance) need to clear the highest bit every 2 billion
> get/put operations to prevent high 32-bit counter from "bleeding over"
> into lower 32-bit counter.
I have a question here.
Is there any chance to the CPU#1 to put the uprobe before CPU#0 gets
the uprobes_treelock, and free uprobe before CPU#0 validate uprobe->ref
again? e.g.
CPU#0 CPU#1
put_uprobe() {
atomic64_add_return()
__get_uprobe();
put_uprobe() {
kfree(uprobe)
}
write_lock(&uprobes_treelock);
atomic64_read(&uprobe->ref);
}
I think it is very rare case, but I could not find any code to prevent
this scenario.
Thank you,
--
Masami Hiramatsu (Google) <[email protected]>