Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-07 Thread Linus Torvalds
On Mon, Dec 7, 2020 at 1:10 AM Peter Zijlstra  wrote:
>
> > PeterZ, is there something I'm missing?
>
> Like this?
>
>   
> https://lkml.kernel.org/r/20200828123720.gz1362...@hirez.programming.kicks-ass.net

Yes, except I think you should remove the old ptrace_may_access() check.

Because otherwise we'll just end up having KCSAN complain about the
unlocked optimistic accesses or something like that.

So do the perfmon_capable() check early - it doesn't need the
exec_update_mutex - and then just do the ptrace_may_access() one late.

I don't see any point at all in checking privileges twice, and I do
see real downsides. Not just that KCSAN issue, but also lack of
coverage (ie the second check will then effectively never be tested,
which is bad too).

   Linus


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-07 Thread Peter Zijlstra
On Sat, Dec 05, 2020 at 12:05:32PM -0600, Eric W. Biederman wrote:
> I am trying to understand why the permission check is there.

It's about observability, is task A allowed to observe state of task B?

By installing a perf event on another task, we can very accurately tell
what it's doing, and isn't fundamentally different from attaching a
debugger (ie. ptrace).

Therefore we chose to use the same security checks. As is good custom,
one does security checks early.

Then Jann came and observed that race against execve mucking with privs,
and we got to hold that mutex across lots.

That patch I proposed earlier should solve that all.


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-07 Thread Peter Zijlstra
On Fri, Dec 04, 2020 at 12:48:18PM -0800, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
>  wrote:
> >>
> > >perf_event_open  (exec_update_mutex -> ovl_i_mutex)
> 
> Side note: this one looks like it should be easy to fix.

> PeterZ, is there something I'm missing?

Like this?

  
https://lkml.kernel.org/r/20200828123720.gz1362...@hirez.programming.kicks-ass.net


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-05 Thread Eric W. Biederman
Davidlohr Bueso  writes:

> On Fri, 04 Dec 2020, Linus Torvalds wrote:
>
>>On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
>> wrote:

>>> >perf_event_open  (exec_update_mutex -> ovl_i_mutex)
>>
>>Side note: this one looks like it should be easy to fix.
>>
>>Is there any real reason why exec_update_mutex is actually gotten that
>>early, and held for that long in the perf event code?
>
> afaict just to validate the whole operation early. Per 79c9ce57eb2 the
> mutex will guard the check and the perf_install_in_context vs exec.
>
>>
>>I _think_ we could move the ptrace check to be much later, to _just_ before 
>>that
>>
>> * This is the point on no return; we cannot fail hereafter.
>>
>>point in the perf event install chain..
>
> Peter had the idea of doing the ptrace_may_access() check twice: first
> lockless and early, then under exec_update_mutex when it mattered right
> before perf_install_in_context():
>
> https://lore.kernel.org/linux-fsdevel/20200828123720.gz1362...@hirez.programming.kicks-ass.net/
>
>>
>>I don't think it needs to be moved down even that much, I think it
>>would be sufficient to move it down below the "perf_event_alloc()",
>>but I didn't check very much.
>
> Yeah we could just keep a single ptrace_may_access() check just further
> down until it won't deadlock.

I am trying to understand why the permission check is there.

The ptrace_may_access check came in with the earliest version of the
code I can find.  So going down that path hasn't helped.

There are about 65 calls of perf_pmu_register so it definitely makes me
nervous holding a semaphore over perf_event_alloc.

What is truly silly in all of this is perf_uprobe_event_init fails if
!perfmon_capable().   Which means the ptrace_may_access check and
holding exec_update_mutex is completely irrelevant to the function of
create_local_trace_uprobe.  Which is strange in and of itself.  I would
have thought uprobes would have been the kind of probe that it would
be safe for ordinary users to use.

This is a much deeper rabit hole than I had anticipated when I started
looking and I will have to come back to this after the weekend.

If at all possible I would love to be able to move the grabbing of
exec_update_mutex after perf_event_alloc and the pluggable functions it
calls.  It doesn't feel possible to audit that.

On the flip side the task is passed in as hw.target.  So it may not be
possible to move the permission check.  It is chaos.

Eric






Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-05 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman  
> wrote:
>>
>> From a deadlock perspective the change is strictly better than what we
>> have today.  The readers will no longer block on each other.
>
> No, agreed, it's better regardless.
>
>> For the specific case that syzbot reported it is readers who were
>> blocking on each other so that specific case if fixed.
>
> So the thing is, a reader can still block another reader if a writer
> comes in between them. Which is why I was thinking that the same
> deadlock could happen if somebody does an execve at just the right
> point.

>> On the write side of exec_update_lock we have:
>>
>> cred_guard_mutex -> exec_update_lock
>>
>> Which means that to get an ABBA deadlock cred_guard_mutex would need to
>> be involved
>
> No, see above: you can get a deadlock with
>
>  - first reader gets exec_update_lock
>
>  - writer on exec_update_lock blocks on first reader (this is exec)
>
>  - second reader of exec_update_lock now blocks on the writer.
>
> So if that second reader holds something that the first one wants to
> get (or is the same thread as the first one), you have a deadlock: the
> first reader will never make any progress, will never release the
> lock, and the writer will never get it, and the second reader will
> forever wait for the writer that is ahead of it in the queue.

Oh.  I see what you are saying.  I knew the writer had to be involved
and block, but if the reader comes first all that has to be added to the
situation is that an exec happens and attempts to take the lock.  Any
reader will block the writer.

For some reason I was blind to the fact that a reader could block the
writer, and I was looking for anything else that might block the writer.

> And that deadlock looks very much like what syzcaller detected, in
> exactly that scenario:
>
>   Chain exists of:
> >exec_update_mutex --> sb_writers#4 --> >lock
>
>Possible unsafe locking scenario:
>
>  CPU0CPU1
>  
> lock(>lock);
>  lock(sb_writers#4);
>  lock(>lock);
> lock(>exec_update_mutex);
>
>*** DEADLOCK ***
>
> No?  The only thing that is missing is that writer that causes the
> exec_update_mutex readers to block each other - exactly like they did
> when it was a mutex.
>
> But I may be missing something entirely obvious that keeps this from 
> happening.

I don't think so.

It does look like my change makes it one step more difficult to cause
the deadlock, but the deadlock can still happen. :-(

Making it a read/writer lock both makes it more difficult to cause
a deadlock and makes a lot of sense from a documentation standpoint
so I still plan on merging the changes after the weekend.


It looks like we do need to have a close look at perf_event_open and
everything associated with it.

Eric


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Davidlohr Bueso

On Fri, 04 Dec 2020, Linus Torvalds wrote:


On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
 wrote:



>perf_event_open  (exec_update_mutex -> ovl_i_mutex)


Side note: this one looks like it should be easy to fix.

Is there any real reason why exec_update_mutex is actually gotten that
early, and held for that long in the perf event code?


afaict just to validate the whole operation early. Per 79c9ce57eb2 the
mutex will guard the check and the perf_install_in_context vs exec.



I _think_ we could move the ptrace check to be much later, to _just_ before that

* This is the point on no return; we cannot fail hereafter.

point in the perf event install chain..


Peter had the idea of doing the ptrace_may_access() check twice: first
lockless and early, then under exec_update_mutex when it mattered right
before perf_install_in_context():

https://lore.kernel.org/linux-fsdevel/20200828123720.gz1362...@hirez.programming.kicks-ass.net/



I don't think it needs to be moved down even that much, I think it
would be sufficient to move it down below the "perf_event_alloc()",
but I didn't check very much.


Yeah we could just keep a single ptrace_may_access() check just further
down until it won't deadlock.

Thanks,
Davidlohr


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Linus Torvalds
On Fri, Dec 4, 2020 at 12:30 PM Bernd Edlinger
 wrote:
>>
> >perf_event_open  (exec_update_mutex -> ovl_i_mutex)

Side note: this one looks like it should be easy to fix.

Is there any real reason why exec_update_mutex is actually gotten that
early, and held for that long in the perf event code?

I _think_ we could move the ptrace check to be much later, to _just_ before that

 * This is the point on no return; we cannot fail hereafter.

point in the perf event install chain..

I don't think it needs to be moved down even that much, I think it
would be sufficient to move it down below the "perf_event_alloc()",
but I didn't check very much.

The fact that create_local_trace_uprobe() can end up going into a
lookup of an OVL filesystem path smells kind of odd to me to begin
with, but I didn't look at the whole thing.

PeterZ, is there something I'm missing?

  Linus


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Bernd Edlinger
On 12/4/20 9:10 PM, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman  
> wrote:
>>
>> From a deadlock perspective the change is strictly better than what we
>> have today.  The readers will no longer block on each other.
> 
> No, agreed, it's better regardless.
> 
>> For the specific case that syzbot reported it is readers who were
>> blocking on each other so that specific case if fixed.
> 
> So the thing is, a reader can still block another reader if a writer
> comes in between them. Which is why I was thinking that the same
> deadlock could happen if somebody does an execve at just the right
> point.
> 
>> On the write side of exec_update_lock we have:
>>
>> cred_guard_mutex -> exec_update_lock
>>
>> Which means that to get an ABBA deadlock cred_guard_mutex would need to
>> be involved
> 
> No, see above: you can get a deadlock with
> 
>  - first reader gets exec_update_lock
> 
>  - writer on exec_update_lock blocks on first reader (this is exec)
> 
>  - second reader of exec_update_lock now blocks on the writer.
> 
> So if that second reader holds something that the first one wants to
> get (or is the same thread as the first one), you have a deadlock: the
> first reader will never make any progress, will never release the
> lock, and the writer will never get it, and the second reader will
> forever wait for the writer that is ahead of it in the queue.
> 
> cred_guard_mutex is immaterial, it's not involved in the deadlock.
> Yes, the writer holds it, but it's not relevant for anything else.
> 
> And that deadlock looks very much like what syzcaller detected, in
> exactly that scenario:
> 
>   Chain exists of:
> >exec_update_mutex --> sb_writers#4 --> >lock
> 
>Possible unsafe locking scenario:
> 
>  CPU0CPU1
>  
> lock(>lock);
>  lock(sb_writers#4);
>  lock(>lock);
> lock(>exec_update_mutex);
> 
>*** DEADLOCK ***
> 
> No?  The only thing that is missing is that writer that causes the
> exec_update_mutex readers to block each other - exactly like they did
> when it was a mutex.
> 
> But I may be missing something entirely obvious that keeps this from 
> happening.
> 

I think you convinced me:

>perf_event_open  (exec_update_mutex -> ovl_i_mutex)
>chown(ovl_i_mutex   -> sb_writes)
>sendfile (sb_writes -> p->lock)
>  by reading from a proc file and writing to overlayfs
>proc_pid_syscall (p->lock   -> exec_update_mutex)


We need just 5 Philosophers (A-E):

Philosopher A calls proc_pid_syscall, takes p->lock, and falls asleep for a 
while now.

Philosphper B calls sendfile, takes sb_writes, and has to wait on p->lock.

Philosopher C calls chown, takes ovl_i_mutes, and has to wait on sb_writes.

Philosopher D calls perf_event_open, takes down_read(exec_mutex_lock), and has 
to wait on ovl_i_mutex.

Philosopher E calls exec, and wants to take down_write(exec_mutex_lock), has to 
wait for Philosopher D.

Now Philosopher A wakes up from his sleep, and wants to take 
down_read(exec_mutex_lock), but due
to fairness reasons, he has to wait until Philosopher E gets and releases his 
write lock again.

If Philosopher A is now blocked, we have a deadlock, right?


Bernd.



>  Linus
> 


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Linus Torvalds
On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman  wrote:
>
> From a deadlock perspective the change is strictly better than what we
> have today.  The readers will no longer block on each other.

No, agreed, it's better regardless.

> For the specific case that syzbot reported it is readers who were
> blocking on each other so that specific case if fixed.

So the thing is, a reader can still block another reader if a writer
comes in between them. Which is why I was thinking that the same
deadlock could happen if somebody does an execve at just the right
point.

> On the write side of exec_update_lock we have:
>
> cred_guard_mutex -> exec_update_lock
>
> Which means that to get an ABBA deadlock cred_guard_mutex would need to
> be involved

No, see above: you can get a deadlock with

 - first reader gets exec_update_lock

 - writer on exec_update_lock blocks on first reader (this is exec)

 - second reader of exec_update_lock now blocks on the writer.

So if that second reader holds something that the first one wants to
get (or is the same thread as the first one), you have a deadlock: the
first reader will never make any progress, will never release the
lock, and the writer will never get it, and the second reader will
forever wait for the writer that is ahead of it in the queue.

cred_guard_mutex is immaterial, it's not involved in the deadlock.
Yes, the writer holds it, but it's not relevant for anything else.

And that deadlock looks very much like what syzcaller detected, in
exactly that scenario:

  Chain exists of:
>exec_update_mutex --> sb_writers#4 --> >lock

   Possible unsafe locking scenario:

 CPU0CPU1
 
lock(>lock);
 lock(sb_writers#4);
 lock(>lock);
lock(>exec_update_mutex);

   *** DEADLOCK ***

No?  The only thing that is missing is that writer that causes the
exec_update_mutex readers to block each other - exactly like they did
when it was a mutex.

But I may be missing something entirely obvious that keeps this from happening.

 Linus


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Eric W. Biederman
Linus Torvalds  writes:

> On Fri, Dec 4, 2020 at 8:08 AM Bernd Edlinger  
> wrote:
>>
>> >
>> > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
>> > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>> >  {
>> > - if (likely(m2 != m1))
>> > - mutex_unlock(m2);
>> > - mutex_unlock(m1);
>> > + if (likely(l2 != l1))
>>
>> is this still necessary ?
>>
>> > + up_read(l2);
>> > + up_read(l1);
>> >  }
>> >
>> > -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
>> > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>> >  {
>> >   int err;
>> >
>> > - if (m2 > m1)
>> > - swap(m1, m2);
>> > + if (l2 > l1)
>> > + swap(l1, l2);
>>
>> and this is probably also no longer necessary?
>
> These are still necessary, because even a recursive read lock can
> still block on a writer trying to come in between the two read locks
> due to fairness guarantees.
>
> So taking the same read lock twice is still a source of possible deadlocks.
>
> For the same reason, read locks still have ABBA deadlock and need to
> be taken in order.
>
> So switching from a mutex to a rwlock doesn't really change the
> locking rules in this respect.

Thinking about the specific case of down_read on two instances of
exec_update_lock.  If there are two instances of kcmp being called with
the sames two pids, but in opposite order running, and the tasks of that
both of those pids refer to both exec, you could definitely get a
deadlock.

So yes.  We definitely need to keep the swap as well.

> In fact, I'm not convinced this change even fixes the deadlock that
> syzbot reported, for the same reason: it just requires a write lock in
> between two read locks to deadlock.

>From a deadlock perspective the change is strictly better than what we
have today.  The readers will no longer block on each other.

For the specific case that syzbot reported it is readers who were
blocking on each other so that specific case if fixed.



On the write side of exec_update_lock we have:

cred_guard_mutex -> exec_update_lock

Which means that to get an ABBA deadlock cred_guard_mutex would need to
be involved and it is only acquired in 3 places: ptrace_attach,
do_seccomp, and proc_pid_attr_write.  In none of the 3 from the syscall
entry point until the code that takes cred_guard_mutex can I find
anything that takes any locks.  Perhaps there is something in io_uring I
did not completely trace that write path.

So given that the exec path would need to be involved, and the exec path
takes exec_update_lock pretty much at the top level.  I am not seeing
how there is any room for deadlocks after this change.

Am I missing something?

Eric










Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Eric W. Biederman
Bernd Edlinger  writes:

> Hi Eric,
>
> I think I remembered from a previous discussion about this topic,
> that it was unclear if the rw_semaphores are working the same
> in RT-Linux.  Will this fix work in RT as well?

The locks should work close enough to the same that correct code is also
correct code on RT-linux.  If not it is an RT-linux bug.

An rw_semaphore may be less than optimal on RT-linux.  I do remember
that mutexes are prefered.  But this change is more about correctness
than anything else.

> On 12/3/20 9:12 PM, Eric W. Biederman wrote:
>> --- a/kernel/kcmp.c
>> +++ b/kernel/kcmp.c
>> @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int 
>> idx)
>>  return file;
>>  }
>>  
>> -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
>> +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>>  {
>> -if (likely(m2 != m1))
>> -mutex_unlock(m2);
>> -mutex_unlock(m1);
>> +if (likely(l2 != l1))
>
> is this still necessary ?

Yes.  Both pids could be threads of the same process or even the same
value so yes this is definitely necessary.  rw_semaphores don't nest on
the same cpu.

>
>> +up_read(l2);
>> +up_read(l1);
>>  }
>>  
>> -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
>> +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>>  {
>>  int err;
>>  
>> -if (m2 > m1)
>> -swap(m1, m2);
>> +if (l2 > l1)
>> +swap(l1, l2);
>
> and this is probably also no longer necessary?

I think lockdep needs this, so it can be certain the same rw_semaphore
is not nesting on the cpu.   Otherwise we will have inconsitencies about
which is the nested lock.  It won't matter in practice, but I am not
certain lockdep knows enough to tell the difference.

If anything removing the swap is a candidate for a follow up patch
where it can be considered separately from other concerns.  For this
patch keeping the logic unchanged makes it trivial to verify that
the conversion from one lock to another is correct.

>>  
>> -err = mutex_lock_killable(m1);
>> -if (!err && likely(m1 != m2)) {
>> -err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
>> +err = down_read_killable(l1);
>> +if (!err && likely(l1 != l2)) {
>
> and this can now be unconditionally, right?

Nope.  The two locks can be the same lock, and they don't nest on a
single cpu.  I tested and verified that lockdep complains bitterly
if down_read_killable_nested is replaced with a simple
down_read_killable.


>> +err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING);
>>  if (err)
>> -mutex_unlock(m1);
>> +up_read(l1);
>>  }
>>  
>>  return err;
>> @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, 
>> type,
>>  /*
>>   * One should have enough rights to inspect task details.
>>   */
>> -ret = kcmp_lock(>signal->exec_update_mutex,
>> ->signal->exec_update_mutex);
>> +ret = kcmp_lock(>signal->exec_update_lock,
>> +>signal->exec_update_lock);
>>  if (ret)
>>  goto err;
>>  if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
>> @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, 
>> type,
>>  }
>>  
>>  err_unlock:
>> -kcmp_unlock(>signal->exec_update_mutex,
>> ->signal->exec_update_mutex);
>> +kcmp_unlock(>signal->exec_update_lock,
>> +>signal->exec_update_lock);
>>  err:
>>  put_task_struct(task1);
>>  put_task_struct(task2);
>
>
> Thanks
> Bernd.


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Linus Torvalds
On Fri, Dec 4, 2020 at 8:08 AM Bernd Edlinger  wrote:
>
> >
> > -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
> > +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
> >  {
> > - if (likely(m2 != m1))
> > - mutex_unlock(m2);
> > - mutex_unlock(m1);
> > + if (likely(l2 != l1))
>
> is this still necessary ?
>
> > + up_read(l2);
> > + up_read(l1);
> >  }
> >
> > -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
> > +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
> >  {
> >   int err;
> >
> > - if (m2 > m1)
> > - swap(m1, m2);
> > + if (l2 > l1)
> > + swap(l1, l2);
>
> and this is probably also no longer necessary?

These are still necessary, because even a recursive read lock can
still block on a writer trying to come in between the two read locks
due to fairness guarantees.

So taking the same read lock twice is still a source of possible deadlocks.

For the same reason, read locks still have ABBA deadlock and need to
be taken in order.

So switching from a mutex to a rwlock doesn't really change the
locking rules in this respect.

In fact, I'm not convinced this change even fixes the deadlock that
syzbot reported, for the same reason: it just requires a write lock in
between two read locks to deadlock.

   Linus


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Bernd Edlinger
Hi Eric,

I think I remembered from a previous discussion about this topic,
that it was unclear if the rw_semaphores are working the same
in RT-Linux.  Will this fix work in RT as well?

On 12/3/20 9:12 PM, Eric W. Biederman wrote:
> --- a/kernel/kcmp.c
> +++ b/kernel/kcmp.c
> @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int 
> idx)
>   return file;
>  }
>  
> -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
> +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>  {
> - if (likely(m2 != m1))
> - mutex_unlock(m2);
> - mutex_unlock(m1);
> + if (likely(l2 != l1))

is this still necessary ?

> + up_read(l2);
> + up_read(l1);
>  }
>  
> -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
> +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>  {
>   int err;
>  
> - if (m2 > m1)
> - swap(m1, m2);
> + if (l2 > l1)
> + swap(l1, l2);

and this is probably also no longer necessary?


>  
> - err = mutex_lock_killable(m1);
> - if (!err && likely(m1 != m2)) {
> - err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
> + err = down_read_killable(l1);
> + if (!err && likely(l1 != l2)) {

and this can now be unconditionally, right?

> + err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING);
>   if (err)
> - mutex_unlock(m1);
> + up_read(l1);
>   }
>  
>   return err;
> @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>   /*
>* One should have enough rights to inspect task details.
>*/
> - ret = kcmp_lock(>signal->exec_update_mutex,
> - >signal->exec_update_mutex);
> + ret = kcmp_lock(>signal->exec_update_lock,
> + >signal->exec_update_lock);
>   if (ret)
>   goto err;
>   if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
> @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>   }
>  
>  err_unlock:
> - kcmp_unlock(>signal->exec_update_mutex,
> - >signal->exec_update_mutex);
> + kcmp_unlock(>signal->exec_update_lock,
> + >signal->exec_update_lock);
>  err:
>   put_task_struct(task1);
>   put_task_struct(task2);


Thanks
Bernd.


[PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-03 Thread Eric W. Biederman


Recently syzbot reported[0] that there is a deadlock amongst the users
of exec_update_mutex.  The problematic lock ordering found by lockdep
was:

   perf_event_open  (exec_update_mutex -> ovl_i_mutex)
   chown(ovl_i_mutex   -> sb_writes)
   sendfile (sb_writes -> p->lock)
 by reading from a proc file and writing to overlayfs
   proc_pid_syscall (p->lock   -> exec_update_mutex)

While looking at possible solutions it occured to me that all of the
users and possible users involved only wanted to state of the given
process to remain the same.  They are all readers.  The only writer is
exec.

There is no reason for readers to block on each other.  So fix
this deadlock by transforming exec_update_mutex into a rw_semaphore
named exec_update_lock that only exec takes for writing.

Cc: Jann Horn 
Cc: Vasiliy Kulikov 
Cc: Al Viro 
Cc: Bernd Edlinger 
Cc: Oleg Nesterov 
Cc: Christopher Yeoh 
Cc: Cyrill Gorcunov 
Cc: Sargun Dhillon 
Cc: Christian Brauner 
Cc: Arnd Bergmann 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex")
[0] https://lkml.kernel.org/r/63640c05ade8e...@google.com
Reported-by: syzbot+db9cdf3dd1f64252c...@syzkaller.appspotmail.com
Signed-off-by: Eric W. Biederman 
---
 fs/exec.c| 12 ++--
 fs/proc/base.c   | 10 +-
 include/linux/sched/signal.h | 11 ++-
 init/init_task.c |  2 +-
 kernel/events/core.c | 12 ++--
 kernel/fork.c|  6 +++---
 kernel/kcmp.c| 30 +++---
 kernel/pid.c |  4 ++--
 8 files changed, 44 insertions(+), 43 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 9e9368603168..b822aa0d682e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -965,8 +965,8 @@ EXPORT_SYMBOL(read_code);
 
 /*
  * Maps the mm_struct mm into the current task struct.
- * On success, this function returns with the mutex
- * exec_update_mutex locked.
+ * On success, this function returns with exec_update_lock
+ * held for writing.
  */
 static int exec_mmap(struct mm_struct *mm)
 {
@@ -981,7 +981,7 @@ static int exec_mmap(struct mm_struct *mm)
if (old_mm)
sync_mm_rss(old_mm);
 
-   ret = mutex_lock_killable(>signal->exec_update_mutex);
+   ret = down_write_killable(>signal->exec_update_lock);
if (ret)
return ret;
 
@@ -995,7 +995,7 @@ static int exec_mmap(struct mm_struct *mm)
mmap_read_lock(old_mm);
if (unlikely(old_mm->core_state)) {
mmap_read_unlock(old_mm);
-   mutex_unlock(>signal->exec_update_mutex);
+   up_write(>signal->exec_update_lock);
return -EINTR;
}
}
@@ -1392,7 +1392,7 @@ int begin_new_exec(struct linux_binprm * bprm)
return 0;
 
 out_unlock:
-   mutex_unlock(>signal->exec_update_mutex);
+   up_write(>signal->exec_update_lock);
 out:
return retval;
 }
@@ -1433,7 +1433,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 * some architectures like powerpc
 */
me->mm->task_size = TASK_SIZE;
-   mutex_unlock(>signal->exec_update_mutex);
+   up_write(>signal->exec_update_lock);
mutex_unlock(>signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(setup_new_exec);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0f707003dda5..dd1f4f6f22bc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -405,11 +405,11 @@ static int proc_pid_wchan(struct seq_file *m, struct 
pid_namespace *ns,
 
 static int lock_trace(struct task_struct *task)
 {
-   int err = mutex_lock_killable(>signal->exec_update_mutex);
+   int err = down_read_killable(>signal->exec_update_lock);
if (err)
return err;
if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_FSCREDS)) {
-   mutex_unlock(>signal->exec_update_mutex);
+   up_read(>signal->exec_update_lock);
return -EPERM;
}
return 0;
@@ -417,7 +417,7 @@ static int lock_trace(struct task_struct *task)
 
 static void unlock_trace(struct task_struct *task)
 {
-   mutex_unlock(>signal->exec_update_mutex);
+   up_read(>signal->exec_update_lock);
 }
 
 #ifdef CONFIG_STACKTRACE
@@ -2928,7 +2928,7 @@ static int do_io_accounting(struct task_struct *task, 
struct seq_file *m, int wh
unsigned long flags;
int result;
 
-   result = mutex_lock_killable(>signal->exec_update_mutex);
+   result = down_read_killable(>signal->exec_update_lock);
if (result)
return result;
 
@@ -2964,7 +2964,7 @@ static int do_io_accounting(struct task_struct *task, 
struct seq_file *m, int wh
result = 0;
 
 out_unlock:
-   mutex_unlock(>signal->exec_update_mutex);
+   up_read(>signal->exec_update_lock);
return