On Fri, Dec 04, 2020 at 02:31:55AM +0800, Wen Yang wrote:
> From: Christian Brauner <[email protected]>
> 
> [ Upstream commit b3e5838252665ee4cfa76b82bdf1198dca81e5be ]
> 
> This patchset makes it possible to retrieve pid file descriptors at
> process creation time by introducing the new flag CLONE_PIDFD to the
> clone() system call.  Linus originally suggested to implement this as a
> new flag to clone() instead of making it a separate system call.  As
> spotted by Linus, there is exactly one bit for clone() left.
> 
> CLONE_PIDFD creates file descriptors based on the anonymous inode
> implementation in the kernel that will also be used to implement the new
> mount api.  They serve as a simple opaque handle on pids.  Logically,
> this makes it possible to interpret a pidfd differently, narrowing or
> widening the scope of various operations (e.g. signal sending).  Thus, a
> pidfd cannot just refer to a tgid, but also a tid, or in theory - given
> appropriate flag arguments in relevant syscalls - a process group or
> session. A pidfd does not represent a privilege.  This does not imply it
> cannot ever be that way but for now this is not the case.
> 
> A pidfd comes with additional information in fdinfo if the kernel supports
> procfs.  The fdinfo file contains the pid of the process in the callers
> pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".
> 
> As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
> parent_tidptr argument of clone.  This has the advantage that we can
> give back the associated pid and the pidfd at the same time.
> 
> To remove worries about missing metadata access this patchset comes with
> a sample program that illustrates how a combination of CLONE_PIDFD, and
> pidfd_send_signal() can be used to gain race-free access to process
> metadata through /proc/<pid>.  The sample program can easily be
> translated into a helper that would be suitable for inclusion in libc so
> that users don't have to worry about writing it themselves.
> 
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>
> Co-developed-by: Jann Horn <[email protected]>
> Signed-off-by: Jann Horn <[email protected]>
> Reviewed-by: Oleg Nesterov <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: "Eric W. Biederman" <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: "Michael Kerrisk (man-pages)" <[email protected]>
> Cc: Andy Lutomirsky <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Aleksa Sarai <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: <[email protected]> # 4.9.x
> (clone: fix up cherry-pick conflicts for b3e583825266)
> Signed-off-by: Wen Yang <[email protected]>
> ---
>  include/linux/pid.h        |   1 +
>  include/uapi/linux/sched.h |   1 +
>  kernel/fork.c              | 119 
> +++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 117 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index 97b745d..7599a78 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -73,6 +73,7 @@ struct pid_link
>       struct hlist_node node;
>       struct pid *pid;
>  };
> +extern const struct file_operations pidfd_fops;
>  
>  static inline struct pid *get_pid(struct pid *pid)
>  {
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index 5f0fe01..ed6e31d 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -9,6 +9,7 @@
>  #define CLONE_FS     0x00000200      /* set if fs info shared between 
> processes */
>  #define CLONE_FILES  0x00000400      /* set if open files shared between 
> processes */
>  #define CLONE_SIGHAND        0x00000800      /* set if signal handlers and 
> blocked signals shared */
> +#define CLONE_PIDFD  0x00001000      /* set if a pidfd should be placed in 
> parent */
>  #define CLONE_PTRACE 0x00002000      /* set if we want to let tracing 
> continue on the child too */
>  #define CLONE_VFORK  0x00004000      /* set if the parent wants the child to 
> wake it up on mm_release */
>  #define CLONE_PARENT 0x00008000      /* set if we want to have the same 
> parent as the cloner */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b64efec..076297a 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -11,7 +11,22 @@
>   * management can be a bitch. See 'mm/memory.c': 'copy_page_range()'
>   */
>  
> +#include <linux/anon_inodes.h>
>  #include <linux/slab.h>
> +#if 0
> +#include <linux/sched/autogroup.h>
> +#include <linux/sched/mm.h>
> +#include <linux/sched/coredump.h>
> +#include <linux/sched/user.h>
> +#include <linux/sched/numa_balancing.h>
> +#include <linux/sched/stat.h>
> +#include <linux/sched/task.h>
> +#include <linux/sched/task_stack.h>
> +#include <linux/sched/cputime.h>
> +#include <linux/seq_file.h>
> +#include <linux/rtmutex.h>
> +>>>>>>> b3e58382... clone: add CLONE_PIDFD
> +#endif

That looks odd :(

Can you please refresh this patch series, and make sure it is correct
and resend it?

thanks,

greg k-h

Reply via email to