On 2017-06-28 15:03, Paul Moore wrote:
> On Tue, Jun 27, 2017 at 5:11 PM, Richard Guy Briggs <r...@redhat.com> wrote:
> > On 2017-05-30 17:21, Paul Moore wrote:
> >> On Tue, Apr 4, 2017 at 5:21 AM, Richard Guy Briggs <r...@redhat.com> wrote:
> 
> ...
> 
> >> > diff --git a/kernel/audit.c b/kernel/audit.c
> >> > index 25dd70a..7d83c5a 100644
> >> > --- a/kernel/audit.c
> >> > +++ b/kernel/audit.c
> >> > @@ -66,6 +66,7 @@
> >> >  #include <linux/freezer.h>
> >> >  #include <linux/pid_namespace.h>
> >> >  #include <net/netns/generic.h>
> >> > +#include <linux/dcache.h>
> >> >
> >> >  #include "audit.h"
> >> >
> >> > @@ -1884,6 +1885,10 @@ void audit_copy_inode(struct audit_names *name, 
> >> > const struct dentry *dentry,
> >> >         name->gid   = inode->i_gid;
> >> >         name->rdev  = inode->i_rdev;
> >> >         security_inode_getsecid(inode, &name->osid);
> >> > +       if (name->dentry) {
> >> > +               dput(name->dentry);
> >> > +               name->dentry = NULL;
> >> > +       }
> >>
> >> Out of curiosity, what terrible things happen if we take a reference
> >> to a non-NULL dentry passed to audit_copy_inode() and store it in
> >> name->dentry?  Does performance tank?
> >
> > Interesting idea.  Right now it is optimized to only take a reference to
> > the dentry's parent dentry in the case we're handed an anonymous entry.
> > Most of the time it will never be used even though we invest in the
> > overhead of taking a reference count.  Besides, __audit_inode_child()
> > hands in a NULL for the dentry parameter to audit_copy_inode().
> 
> [NOTE: audit_copy_inode() hands a NULL dentry only in the anonymous parent 
> case]
> 
> I believe I was just thinking of less conditional handling, especially
> when reference counts are concerned.  I'm just trying to limit future
> headaches, but I suspect the perf cost would be problematic, and as
> you point out, there is no *need* for the majority of cases.
> 
> Looking at this again today, why would we want to clear name->dentry
> in audit_copy_inode() if it is already set?  Does that ever happen?
> I'm not sure it does ...

Ok, I just tried re-writing that part to dput that dentry only for the
child of an anonymous parent rather than whenever audit_copy_inode() was
called and I ended up with a:

        BUG: Dentry ffff8800338f1dc0{i=369a,n=stderr}  still in use (1) 
[unmount of tmpfs tmpfs]
        WARNING: CPU: 0 PID: 387 at fs/dcache.c:1445 umount_check+0x99/0xc0

This was after rebasing on a recent audit-next based on 4.11 rather than
the previous based on a 4.8 audit-next.  Something else changed in the
kernel or kernel config between these two points.  I was getting some
"INFO: suspicious RCU usage." warnings in idmap for NFS on the earlier
kernel and that is no longer happenning, and I'm no longer getting any
tracefs audit PATH entries in the more recent kernel which suggests that
whatever was causing the anonymous parent PATH records from tracefs has
been changed/fixed/configured-out.  This tmpfs Dentry issue happens on
both 4.8 and 4.11 kernels.

So the tmpfs is being unmounted within the time of a task that has taken
an anonymous parent audit_name and it appears that the dput in
audit_copy_inode() called from elsewhere had reset it to avoid
needlessly extending the life of this dentry.

I see two obvious ways to solve this:
        - Return to freeing this dentry from audit_copy_inode()
        - Add tmpfs to the list of filesystems to not create PATH records

I'm really not crazy aobut this second one unless I know why tmpfs is
generating calls with anonymous parents.

For reference, the rest of the call trace is:
        dump_stack+0x85/0xc9
        __warn+0xd1/0xf0
        ? d_genocide_kill+0x40/0x40
        warn_slowpath_null+0x1d/0x20
        umount_check+0x99/0xc0
        d_walk+0x10b/0x580
        ? do_one_tree+0x26/0x40
        do_one_tree+0x26/0x40
        shrink_dcache_for_umount+0x5d/0xd0
        generic_shutdown_super+0x1f/0xf0
        kill_litter_super+0x29/0x40
        deactivate_locked_super+0x43/0x70
        deactivate_super+0x88/0xa0
        cleanup_mnt+0x8e/0xe0
        __cleanup_mnt+0x12/0x20
        task_work_run+0x83/0xc0
        do_exit+0x45c/0xfe0
        ? syscall_trace_enter+0x2e4/0x400
        do_group_exit+0x68/0xe0
        SyS_exit_group+0x14/0x20
        do_syscall_64+0x82/0x270
        entry_SYSCALL64_slow_path+0x25/0x25

> > I'm assuming you are hinting at also using that dentry to compare the
> > audit_names entry, which I think it a bad idea since there could be
> > multiple paths to access a dentry.  I did orignially have another patch
> > that would have tried to use that as well, which didn't seem to hurt,
> > but I didn't think was worth upstreaming.
> 
> No, I wasn't thinking that, the dev/inode numbers should be sufficient
> in those cases I believe; I'm not sure the dentry would help us here.
> 
> >> Also out of curiosity, why do we want to drop a dentry reference here
> >> if one already exists?
> >
> > I think we want to drop a dentry reference here because this inode child
> > could be a subsequent access to the same dentry with a full path,
> > removing the need to cache this dentry information in the first place.
> 
> Related to my comment above from today ... what code path please?

So it appears that there is a code path that does free this dentry in a
timely fashion to avoid needlessly extending its life and simply leaving
it there until the audit_context is freed is too long.

> >> > @@ -1925,6 +1930,17 @@ void audit_log_name(struct audit_context 
> >> > *context, struct audit_names *n,
> >> >                         audit_log_n_untrustedstring(ab, n->name->name,
> >> >                                                     n->name_len);
> >> >                 }
> >> > +       } else if (n->dentry) {
> >> > +               char *fullpath;
> >> > +               const char *fullpathp;
> >> > +
> >> > +               fullpath = kmalloc(PATH_MAX, GFP_KERNEL);
> >> > +               if (!fullpath)
> >> > +                       return;
> >>
> >> I'm wondering if there is some value in still emitting the record if
> >> the kmalloc() fails, just with the name field set as the unset "?"
> >> value, e.g. "name=?".  Thoughts?
> >
> > Possibly.  We've got much bigger problems if that happens, but this
> > sounds like a good defensive coding approach.  I'm even tempted to call
> > audit_panic().
> 
> No audit_panic().  We've still got good information that we can
> record, e.g. dev/inode numbers; let's just print "name=?" and go on
> our way recording the rest of the information.  This is in keeping
> with the current audit_log_name() error handling.
> 
> At the very least you need to clean up here instead of just returning.
> As the patch currently stands I believe this will end up leaking an
> audit_buffer.

This has been fixed along with a fullpath kmalloc leak.

> paul moore

- RGB

--
Richard Guy Briggs <r...@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

Reply via email to