On Fri, Jun 07, 2019 at 03:17:40PM +0100, David Howells wrote:
> Add UAPI definitions for the general notification ring, including the
> following pieces:
> 
>  (1) struct watch_notification.
> 
>      This is the metadata header for each entry in the ring.  It includes a
>      type and subtype that indicate the source of the message
>      (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message
>      (eg. NOTIFY_MOUNT_NEW_MOUNT).
> 
>      The header also contains an information field that conveys the
>      following information:
> 
>       - WATCH_INFO_LENGTH.  The size of the entry (entries are variable
>           length).
> 
>       - WATCH_INFO_OVERRUN.  If preceding messages were lost due to ring
>         overrun or lack of memory.
> 
>       - WATCH_INFO_ENOMEM.  If preceding messages were lost due to lack
>           of memory.
> 
>       - WATCH_INFO_RECURSIVE.  If the event detected was applied to
>           multiple objects (eg. a recursive change to mount attributes).
> 
>       - WATCH_INFO_IN_SUBTREE.  If the event didn't happen at the watched
>           object, but rather to some related object (eg. a subtree mount
>           watch saw a mount happen somewhere within the subtree).
> 
>       - WATCH_INFO_TYPE_FLAGS.  Eight flags whose meanings depend on the
>           message type.
> 
>       - WATCH_INFO_ID.  The watch ID specified when the watchpoint was
>           set.
> 
>      All the information in the header can be used in filtering messages at
>      the point of writing into the buffer.
> 
>  (2) struct watch_queue_buffer.
> 
>      This describes the layout of the ring.  Note that the first slots in
>      the ring contain a special metadata entry that contains the ring
>      pointers.  The producer in the kernel knows to skip this and it has a
>      proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that
>      indicates the size so that the ring consumer can handle it the same as
>      any other record and just skip it.
> 
>      Note that this means that ring entries can never be split over the end
>      of the ring, so if an entry would need to be split, a skip record is
>      inserted to wrap the ring first; this is also WATCH_TYPE_META,
>      WATCH_META_SKIP_NOTIFICATION.
> 
> Signed-off-by: David Howells <dhowe...@redhat.com>

I'm starting by reading the uapi changes and the sample program...

> ---
> 
>  include/uapi/linux/watch_queue.h |   63 
> ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>  create mode 100644 include/uapi/linux/watch_queue.h
> 
> diff --git a/include/uapi/linux/watch_queue.h 
> b/include/uapi/linux/watch_queue.h
> new file mode 100644
> index 000000000000..c3a88fa5f62a
> --- /dev/null
> +++ b/include/uapi/linux/watch_queue.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_WATCH_QUEUE_H
> +#define _UAPI_LINUX_WATCH_QUEUE_H
> +
> +#include <linux/types.h>
> +
> +enum watch_notification_type {
> +     WATCH_TYPE_META         = 0,    /* Special record */
> +     WATCH_TYPE_MOUNT_NOTIFY = 1,    /* Mount notification record */
> +     WATCH_TYPE_SB_NOTIFY    = 2,    /* Superblock notification */
> +     WATCH_TYPE_KEY_NOTIFY   = 3,    /* Key/keyring change notification */
> +     WATCH_TYPE_BLOCK_NOTIFY = 4,    /* Block layer notifications */
> +#define WATCH_TYPE___NR 5

Given the way enums work I think you can just make WATCH_TYPE___NR the
last element in the enum?

> +};
> +
> +enum watch_meta_notification_subtype {
> +     WATCH_META_SKIP_NOTIFICATION    = 0,    /* Just skip this record */
> +     WATCH_META_REMOVAL_NOTIFICATION = 1,    /* Watched object was removed */
> +};
> +
> +/*
> + * Notification record
> + */
> +struct watch_notification {

Kind of a long name...

> +     __u32                   type:24;        /* enum watch_notification_type 
> */
> +     __u32                   subtype:8;      /* Type-specific subtype 
> (filterable) */

16777216 diferent types and 256 different subtypes?  My gut instinct
wants a better balance, though I don't know where I'd draw the line.
Probably 12 bits for type and 10 for subtype?  OTOH I don't have a good
sense of how many distinct notification types an XFS would want to send
back to userspace, and maybe 256 subtypes is fine.  We could always
reserve another watch_notification_type if we need > 256.

Ok, no objections. :)

> +     __u32                   info;
> +#define WATCH_INFO_OVERRUN   0x00000001      /* Event(s) lost due to overrun 
> */
> +#define WATCH_INFO_ENOMEM    0x00000002      /* Event(s) lost due to ENOMEM 
> */
> +#define WATCH_INFO_RECURSIVE 0x00000004      /* Change was recursive */
> +#define WATCH_INFO_LENGTH    0x000001f8      /* Length of record / 
> sizeof(watch_notification) */

This is a mask, isn't it?  Could we perhaps have some helpers here?
Something along the lines of...

#define WATCH_INFO_LENGTH_MASK  0x000001f8
#define WATCH_INFO_LENGTH_SHIFT 3

static inline size_t watch_notification_length(struct watch_notification *wn)
{
        return (wn->info & WATCH_INFO_LENGTH_MASK) >> WATCH_INFO_LENGTH_SHIFT *
                        sizeof(struct watch_notification);
}

static inline struct watch_notification *watch_notification_next(
                struct watch_notification *wn)
{
        return wn + ((wn->info & WATCH_INFO_LENGTH_MASK) >>
                        WATCH_INFO_LENGTH_SHIFT);
}

...so that we don't have to opencode all of the ring buffer walking
magic and stuff?

(I might also shorten the namespace from WATCH_INFO_ to WNI_ ...)

Hmm so the length field is 6 bits and therefore the maximum size of a
notification record is ... 63 * (sizeof(u32)  * 2) = 504 bytes?  Which
means that kernel users can send back a maximum payload of 496 bytes?
That's probably big enough for random fs notifications (bad metadata
detected, media errors, etc.)

Judging from the sample program I guess all that userspace does is
allocate a memory buffer and toss it into the kernel, which then
initializes the ring management variables, and from there we just scan
around the ring buffer every time poll(watch_fd) says there's something
to do?  How does userspace tell the kernel the size of the ring buffer?

Does (watch_notification->info & WATCH_INFO_LENGTH) == 0 have any
meaning besides apparently "stop looking at me"?

> +#define WATCH_INFO_IN_SUBTREE        0x00000200      /* Change was not at 
> watched root */
> +#define WATCH_INFO_TYPE_FLAGS        0x00ff0000      /* Type-specific flags 
> */

WATCH_INFO_FLAG_MASK ?

> +#define WATCH_INFO_FLAG_0    0x00010000
> +#define WATCH_INFO_FLAG_1    0x00020000
> +#define WATCH_INFO_FLAG_2    0x00040000
> +#define WATCH_INFO_FLAG_3    0x00080000
> +#define WATCH_INFO_FLAG_4    0x00100000
> +#define WATCH_INFO_FLAG_5    0x00200000
> +#define WATCH_INFO_FLAG_6    0x00400000
> +#define WATCH_INFO_FLAG_7    0x00800000
> +#define WATCH_INFO_ID                0xff000000      /* ID of watchpoint */

WATCH_INFO_ID_MASK ?

> +#define WATCH_INFO_ID__SHIFT 24

Why double underscore here?

> +};
> +
> +#define WATCH_LENGTH_SHIFT   3
> +
> +struct watch_queue_buffer {
> +     union {
> +             /* The first few entries are special, containing the
> +              * ring management variables.

The first /two/ entries, correct?

Also, weird multiline comment style.

> +              */
> +             struct {
> +                     struct watch_notification watch; /* WATCH_TYPE_META */

Do these structures have to be cache-aligned for the atomic reads and
writes to work?

--D

> +                     __u32           head;           /* Ring head index */
> +                     __u32           tail;           /* Ring tail index */
> +                     __u32           mask;           /* Ring index mask */
> +             } meta;
> +             struct watch_notification slots[0];
> +     };
> +};
> +
> +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */
> 

Reply via email to