Hello Sargun,,

On 10/29/20 9:53 AM, Sargun Dhillon wrote:
> On Mon, Oct 26, 2020 at 10:55:04AM +0100, Michael Kerrisk (man-pages) wrote:

[...]

>>    ioctl(2) operations
>>        The following ioctl(2) operations are provided to support seccomp
>>        user-space notification.  For each of these operations, the first
>>        (file descriptor) argument of ioctl(2) is the listening file
>>        descriptor returned by a call to seccomp(2) with the
>>        SECCOMP_FILTER_FLAG_NEW_LISTENER flag.
>>
>>        SECCOMP_IOCTL_NOTIF_RECV
>>               This operation is used to obtain a user-space notification
>>               event.  If no such event is currently pending, the
>>               operation blocks until an event occurs.  The third
>>               ioctl(2) argument is a pointer to a structure of the
>>               following form which contains information about the event.
>>               This structure must be zeroed out before the call.
>>
>>                   struct seccomp_notif {
>>                       __u64  id;              /* Cookie */
>>                       __u32  pid;             /* TID of target thread */
>>                       __u32  flags;           /* Currently unused (0) */
>>                       struct seccomp_data data;   /* See seccomp(2) */
>>                   };
>>
>>               The fields in this structure are as follows:
>>
>>               id     This is a cookie for the notification.  Each such
>>                      cookie is guaranteed to be unique for the
>>                      corresponding seccomp filter.
>>
>>                      · It can be used with the
>>                        SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation
>>                        to verify that the target is still alive.
>>
>>                      · When returning a notification response to the
>>                        kernel, the supervisor must include the cookie
>>                        value in the seccomp_notif_resp structure that is
>>                        specified as the argument of the
>>                        SECCOMP_IOCTL_NOTIF_SEND operation.
>>
>>               pid    This is the thread ID of the target thread that
>>                      triggered the notification event.
>>
>>               flags  This is a bit mask of flags providing further
>>                      information on the event.  In the current
>>                      implementation, this field is always zero.
>>
>>               data   This is a seccomp_data structure containing
>>                      information about the system call that triggered
>>                      the notification.  This is the same structure that
>>                      is passed to the seccomp filter.  See seccomp(2)
>>                      for details of this structure.
>>
>>               On success, this operation returns 0; on failure, -1 is
>>               returned, and errno is set to indicate the cause of the
>>               error.  This operation can fail with the following errors:
>>
>>               EINVAL (since Linux 5.5)
>>                      The seccomp_notif structure that was passed to the
>>                      call contained nonzero fields.
>>
>>               ENOENT The target thread was killed by a signal as the
>>                      notification information was being generated, or
>>                      the target's (blocked) system call was interrupted
>>                      by a signal handler.
>>
>>        ┌─────────────────────────────────────────────────────┐
>>        │FIXME                                                │
>>        ├─────────────────────────────────────────────────────┤
>>        │From my experiments, it appears that if a            │
>>        │SECCOMP_IOCTL_NOTIF_RECV is done after the target    │
>>        │thread terminates, then the ioctl() simply blocks    │
>>        │(rather than returning an error to indicate that the │
>>        │target no longer exists).                            │
>>        │                                                     │
>>        │I found that surprising, and it required some        │
>>        │contortions in the example program.  It was not      │
>>        │possible to code my SIGCHLD handler (which reaps the │
>>        │zombie when the worker/target terminates) to simply  │
>>        │set a flag checked in the main handleNotifications() │
>>        │loop, since this created an unavoidable race where   │
>>        │the child might terminate just after I had checked   │
>>        │the flag, but before I blocked (forever!) in the     │
>>        │SECCOMP_IOCTL_NOTIF_RECV operation. Instead, I had   │
>>        │to code the signal handler to simply call _exit(2)   │
>>        │in order to terminate the parent process (the        │
>>        │supervisor).                                         │
>>        │                                                     │
>>        │Is this expected behavior? It seems to me rather     │
>>        │desirable that SECCOMP_IOCTL_NOTIF_RECV should give  │
>>        │an error if the target has terminated.               │
>>        │                                                     │
>>        │Jann posted a patch to rectify this, but there was   │
>>        │no response (Lore link: https://bit.ly/3jvUBxk) to   │
>>        │his question about fixing this issue. (I've tried    │
>>        │building with the patch, but encountered an issue    │
>>        │with the target process entering D state after a     │
>>        │signal.)                                             │
>>        │                                                     │
>>        │For now, this behavior is documented in BUGS.        │
>>        │                                                     │
>>        │Kees Cook commented: Let's change [this] ASAP!       │
>>        └─────────────────────────────────────────────────────┘
>>
> 
> I think I commented in another thread somewhere that the supervisor is not 
> notified if the syscall is preempted. Therefore if it is performing a 
> preemptible, long-running syscall, you need to poll
> SECCOMP_IOCTL_NOTIF_ID_VALID in the background, otherwise you can
> end up in a bad situation -- like leaking resources, or holding on to
> file descriptors after the program under supervision has intended to
> release them.

It's been a long day, and I'm not sure I reallu understand this.
Could you outline the scnario in more detail?

> A very specific example is if you're performing an accept on behalf
> of the program generating the notification, and the program intends
> to reuse the port. You can get into all sorts of awkward situations
> there.

[...]

>       SECCOMP_IOCTL_NOTIF_ADDFD (Since Linux v5.9)
>               This operations is used by the supervisor to add a file
>               descriptor to the process that generated the notification.
>               This can be used by the supervisor to enable "emulation"
>               [Probably a better word] of syscalls which return file
>               descriptors, such as socket(2), or open(2).
> 
>               When the file descriptor is received by the process that
>               is associated with the notification / cookie, it follows
>               SCM_RIGHTS like semantics, and is evaluated by MAC.

I'm not sure what you mean by SCM_RIGHTS like semantics. Do you mean,
the file descriptor refers to the same open file description
('struct file')?

"is evaluated by MAC"... Do you mean something like: the FD is 
subject  to LSM checks?

>               In addition, if it is a socket, it inherits the cgroup
>               v1 classid and netprioidx of the receiving process.
> 
>               The argument of this is as follows:
> 
>                       struct seccomp_notif_addfd {
>                               __u64 id;
>                               __u32 flags;
>                               __u32 srcfd;
>                               __u32 newfd;
>                               __u32 newfd_flags;
>                       };
> 
>               id
>                       This is the cookie value that was obtained using
>                       SECCOMP_IOCTL_NOTIF_RECV.
> 
>               flags
>                       A bitmask that includes zero or more of the
>                       SECCOMP_ADDFD_FLAG_* bits set
> 
>                       SECCOMP_ADDFD_FLAG_SETFD - Use dup2 (or dup3?)
>                               like semantics when copying the file
>                               descriptor.
> 
>               srcfd
>                       The file descriptor number to copy in the
>                       supervisor process.
> 
>               newfd
>                       If the SECCOMP_ADDFD_FLAG_SETFD flag is specified
>                       this will be the file descriptor that is used
>                       in the dup2 semantics. If this file descriptor
>                       exists in the receiving process, it is closed
>                       and replaced by this file descriptor in an
>                       atomic fashion. If the copy process fails
>                       due to a MAC failure, or if srcfd is invalid,
>                       the newfd will not be closed in the receiving
>                       process.

Great description!

>                       If SECCOMP_ADDFD_FLAG_SETFD it not set, then
>                       this value must be 0.
> 
>               newfd_flags
>                       The file descriptor flags to set on
>                       the file descriptor after it has been received
>                       by the process. The only flag that can currently
>                       be specified is O_CLOEXEC.
> 
>               On success, this operation returns the file descriptor
>               number in the receiving process. On failure, -1 is returned.
> 
>               It can fail with the following error codes:
> 
>               EINPROGRESS
>                       The cookie number specified hasn't been received
>                       by the listener

I don't understand this. Can you say more about the scenario?

>               ENOENT
>                       The cookie number is not valid. This can happen
>                       if a response has already been sent, or if the
>                       syscall was interrupted
> 
>               EBADF
>                       If the file descriptor specified in srcfd is
>                       invalid, or if the fd is out of range of the
>                       destination program.

The piece "or if the fd is out of range of the destination
program" is not clear to me. Can you say some more please.

>               EINVAL
>                       If flags or new_flags were unrecognized, or
>                       if newfd is non-zero, and SECCOMP_ADDFD_FLAG_SETFD
>                       has not been set.
> 
>               EMFILE
>                       Too many files are open by the destination process.
> 
>               [there's other error codes possible, like from the LSMs
>                or if memory can't be read / written or ebusy]
>                
> Does this help?

It's a good start!

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Reply via email to