On Sat, Apr 17, 2021 at 04:48:53PM +0000, Al Viro wrote:
> On Sat, Apr 17, 2021 at 07:36:39AM -0700, Alexei Starovoitov wrote:
> 
> > The kernel will perform the same work with FDs. The same locks are held
> > and the same execution conditions are in both cases. The LSM hooks,
> > fsnotify, etc will be called the same way.
> > It's no different if new syscall was introduced "sys_foo(int num)" that
> > would do { return close_fd(num); }.
> > It would opearate in the same user context.
> 
> Hmm...  unless I'm misreading the code, one of the call chains would seem to
> be sys_bpf() -> bpf_prog_test_run() -> ->test_run() -> ... -> bpf_sys_close().
> OK, as long as you make sure bpf_prog_get() does fdput() (i.e. that we
> don't have it restructured so that fdget()/fdput() pair would be lifted into
> bpf_prog_test_run(), with fdput() moved in place of bpf_prog_put()).

Got it. There is no fdget/put bracketing in the code.
On the way to test_run we do __bpf_prog_get() which does fdget and immediately
fdput after incrementing refcnt of the prog.
I believe this pattern is consistent everywhere in kernel/bpf/*

> Note that we *really* can not allow close_fd() on anything to be bracketed
> by fdget()/fdput() pair; we had bugs of that sort and, as the matter of fact,
> still have one in autofs_dev_ioctl().
> 
> The trouble happens if you have file F with 2 references, held by descriptor
> tables of different processes.  Say, process A has descriptor 6 refering to
> it, while B has descriptor 42 doing the same.  Descriptor tables of A and B
> are not shared with anyone.
> 
> A: fdget(6)   -> returns a reference to F, refcount _not_ touched
> A: close_fd(6)        -> rips the reference to F from descriptor table, does 
> fput(F)
>                  refcount drops to 1.
> B: close(42)  -> rips the reference to F from B's descriptor table, does 
> fput(F)
>                  This time refcount does reach 0 and we use task_work_add() to
>                  make sure the destructor (__fput()) runs before B returns to
>                  userland.  sys_close() returns and B goes off to userland.
>                  On the way out __fput() is run, and among other things,
>                  ->release() of F is executed, doing whatever it wants to do.
>                  F is freed.
> And at that point A, which presumably is using the guts of F, gets screwed.

Thanks for these details. That's really helpful.

>       So please, mark all call sites with "make very sure you never get
> here with unpaired fdget()".

Good point. Will add this comment.

>       BTW, if my reading (re ->test_run()) is correct, what limits the 
> recursion
> via bpf_sys_bpf()?

Glad you asked! This kind of code review questions are much appreciated.

It's an allowlist of possible commands in bpf_sys_bpf().
'case BPF_PROG_TEST_RUN:' is not there for this exact reason.
I'll add a comment to make it more obvious.

Reply via email to