On Sun, 08 Feb 2015 11:07:32 +0800
Ian Kent <[email protected]> wrote:

> On Fri, 2015-02-06 at 07:08 -0500, Jeff Layton wrote:
> > On Thu, 05 Feb 2015 10:34:11 +0800
> > Ian Kent <[email protected]> wrote:
> > 
> > > The call_usermodehelper() function executes all binaries in the
> > > global "init" root context. This doesn't allow a binary to be run
> > > within a namespace (eg. the namespace of a container).
> > > 
> > > Both containerized NFS client and NFS server need the ability to
> > > execute a binary in a container's context. To do this use the init
> > > process of the callers environment is used to setup the namespaces
> > > in the same way the root init process is used otherwise.
> > > 
> > > Signed-off-by: Ian Kent <[email protected]>
> > > Cc: Benjamin Coddington <[email protected]>
> > > Cc: Al Viro <[email protected]>
> > > Cc: J. Bruce Fields <[email protected]>
> > > Cc: David Howells <[email protected]>
> > > Cc: Trond Myklebust <[email protected]>
> > > Cc: Oleg Nesterov <[email protected]>
> > > Cc: Eric W. Biederman <[email protected]>
> > > Cc: Jeff Layton <[email protected]>
> > > ---
> > >  include/linux/kmod.h |   16 +++++++
> > >  kernel/kmod.c        |  115 
> > > +++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  2 files changed, 128 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> > > index 15bdeed..b0f1b3c 100644
> > > --- a/include/linux/kmod.h
> > > +++ b/include/linux/kmod.h
> > > @@ -52,6 +52,7 @@ struct file;
> > >  #define UMH_WAIT_EXEC    1       /* wait for the exec, but not the 
> > > process */
> > >  #define UMH_WAIT_PROC    2       /* wait for the process to complete */
> > >  #define UMH_KILLABLE     4       /* wait for EXEC/PROC killable */
> > > +#define UMH_USE_NS       8       /* exec using caller's init namespace */
> > >  
> > >  struct subprocess_info {
> > >   struct work_struct work;
> > > @@ -69,6 +70,21 @@ struct subprocess_info {
> > >  extern int
> > >  call_usermodehelper(char *path, char **argv, char **envp, int flags);
> > >  
> > > +#if !defined(CONFIG_PROC_FS) || !defined(CONFIG_NAMESPACES)
> > > +inline struct task_struct *umh_get_init_task(void)
> > > +{
> > > + return ERR_PTR(-ENOTSUP);
> > > +}
> > > +
> > > +inline int umh_enter_ns(struct task_struct *tsk, struct cred *new)
> > > +{
> > > + return -ENOTSUP;
> > > +}
> > > +#else
> > > +struct task_struct *umh_get_init_pid(void);
> > > +int umh_enter_ns(struct task_struct *tsk, struct cred *new);
> > > +#endif
> > > +
> > >  extern struct subprocess_info *
> > >  call_usermodehelper_setup(char *path, char **argv, char **envp, gfp_t 
> > > gfp_mask,
> > >                     int (*init)(struct subprocess_info *info, struct cred 
> > > *new),
> > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > index 14c0188..4c649d6 100644
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -582,6 +582,98 @@ unlock:
> > >  }
> > >  EXPORT_SYMBOL(call_usermodehelper_exec);
> > >  
> > > +#if defined(CONFIG_PROC_FS) && defined(CONFIG_NAMESPACES)
> > > +#define NS_PATH_MAX      35
> > > +#define NS_PATH_FMT      "%lu/ns/%s"
> > > +
> > > +/* Note namespace name order is significant */
> > > +static const char *ns_names[] = { "user", "ipc", "uts", "net", "pid", 
> > > "mnt", NULL };
> > > +
> > > +struct task_struct *umh_get_init_pid(void)
> > 
> > nit: we're not getting a pid here but a task_struct pointer. Maybe this
> > should be called umh_get_init_task?
> 
> Ha, yep.
> 
> > 
> > > +{
> > > + struct task_struct *tsk;
> > > +
> > > + rcu_read_lock();
> > > + tsk = find_task_by_vpid(1);
> > > + if (tsk)
> > > +         get_task_struct(tsk);
> > > + rcu_read_unlock();
> > 
> > I'm not terribly familiar with the task_struct lifetime rules...
> > 
> > I assume that you can be assured that tsk won't go away while you hold
> > the rcu_read_lock, but is doing a get_task_struct while holding it
> > sufficient to pin it after you drop the lock?
> > 
> > IOW, could the refcount on the task_struct do a 0->1 transition here and
> > end up being freed anyway after you've grabbed a reference?
> 
> Good point, I thought getting a reference under he read lock would be
> enough but maybe I need more checks as I do with dentrys. I'll check
> that.
> 

It looks like the rcu_read_lock is mostly there to protect the pid_hash
actually, and get_pid_task seems to do something very similar here. So,
I think you're probably fine to do what you're doing in this patch.

That said, the "What is struct pid?" comments in include/linux/pid.h
are interesting. I wonder if my comments on your original patch were
actually unfounded. If you hold a reference to a pid_t, that might be
enough to ensure that it doesn't get reused, but I'm not sure at that
point if it could end up being detached from the task.

I suspect that pinning the actual task like you're doing here is
probably the right thing to do, but I'd certainly value input from
someone who understands the task/pid interaction better than I do.

-- 
Jeff Layton <[email protected]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to