Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-28 Thread Steven Rostedt
On Tue, 23 Apr 2024 12:04:15 -0400
"Liam R. Howlett"  wrote:

> > Nit: For all labels, please add a space before them. Otherwise, diffs will
> > show "unlock" as the function and not "ring_buffer_map", making it harder
> > to find where the change is.
> >   
> 
> Isn't the inclusion of a space before labels outdate since 'git diff'
> superseds 'diff' and fixed this issue?

I still use patch and quilt ;-) and that still suffers from that on
reject files (used for backporting and such).

-- Steve



Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-23 Thread Liam R. Howlett
* Steven Rostedt  [240410 13:41]:
> On Sat,  6 Apr 2024 18:36:46 +0100
> Vincent Donnefort  wrote:
> 
> > +int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> > +   struct vm_area_struct *vma)
> > +{
> > +   struct ring_buffer_per_cpu *cpu_buffer;
> > +   unsigned long flags, *subbuf_ids;
> > +   int err = 0;
> > +
> > +   if (!cpumask_test_cpu(cpu, buffer->cpumask))
> > +   return -EINVAL;
> > +
> > +   cpu_buffer = buffer->buffers[cpu];
> > +
> > +   mutex_lock(_buffer->mapping_lock);
> > +
> > +   if (cpu_buffer->mapped) {
> > +   err = __rb_map_vma(cpu_buffer, vma);
> > +   if (!err)
> > +   err = __rb_inc_dec_mapped(cpu_buffer, true);
> > +   mutex_unlock(_buffer->mapping_lock);
> > +   return err;
> > +   }
> > +
> > +   /* prevent another thread from changing buffer/sub-buffer sizes */
> > +   mutex_lock(>mutex);
> > +
> > +   err = rb_alloc_meta_page(cpu_buffer);
> > +   if (err)
> > +   goto unlock;
> > +
> > +   /* subbuf_ids include the reader while nr_pages does not */
> > +   subbuf_ids = kcalloc(cpu_buffer->nr_pages + 1, sizeof(*subbuf_ids), 
> > GFP_KERNEL);
> > +   if (!subbuf_ids) {
> > +   rb_free_meta_page(cpu_buffer);
> > +   err = -ENOMEM;
> > +   goto unlock;
> > +   }
> > +
> > +   atomic_inc(_buffer->resize_disabled);
> > +
> > +   /*
> > +* Lock all readers to block any subbuf swap until the subbuf IDs are
> > +* assigned.
> > +*/
> > +   raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> > +   rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
> > +   raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
> > +
> > +   err = __rb_map_vma(cpu_buffer, vma);
> > +   if (!err) {
> > +   raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> > +   cpu_buffer->mapped = 1;
> > +   raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
> > +   } else {
> > +   kfree(cpu_buffer->subbuf_ids);
> > +   cpu_buffer->subbuf_ids = NULL;
> > +   rb_free_meta_page(cpu_buffer);
> > +   }
> > +unlock:
> 
> Nit: For all labels, please add a space before them. Otherwise, diffs will
> show "unlock" as the function and not "ring_buffer_map", making it harder
> to find where the change is.
> 

Isn't the inclusion of a space before labels outdate since 'git diff'
superseds 'diff' and fixed this issue?

Thanks,
Liam



Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-23 Thread David Hildenbrand

On 22.04.24 22:31, Vincent Donnefort wrote:

On Mon, Apr 22, 2024 at 08:27:17PM +0200, David Hildenbrand wrote:

On 22.04.24 20:20, Vincent Donnefort wrote:

Hi David,

Thanks for having a look, very much appreciated!

On Mon, Apr 22, 2024 at 11:27:11AM +0200, David Hildenbrand wrote:

On 19.04.24 20:25, David Hildenbrand wrote:

On 06.04.24 19:36, Vincent Donnefort wrote:

In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

  ring_buffer_{map,unmap}()

And controls on the ring-buffer:

  ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

  A unique ID for each subbuf of the ring-buffer, currently they are
  only identified through their in-kernel VA.

  A meta-page, where are stored ring-buffer statistics and a
  description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dc5ae4e96aee..96d2140b471e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
 #include 
 #include 
+#include 
+
 struct trace_buffer;
 struct ring_buffer_iter;
@@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
hlist_node *node);
 #define trace_rb_cpu_prepare   NULL
 #endif
+int ring_buffer_map(struct trace_buffer *buffer, int cpu,
+   struct vm_area_struct *vma);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
 #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..ffcd8dfcaa4f
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cc9ebe593571..793ecc454039 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -26,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
@@ -338,6 +340,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
 };
@@ -484,6 +487,12 @@ struct ring_buffer_per_cpu {
u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf VA */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-22 Thread Vincent Donnefort
On Mon, Apr 22, 2024 at 08:27:17PM +0200, David Hildenbrand wrote:
> On 22.04.24 20:20, Vincent Donnefort wrote:
> > Hi David,
> > 
> > Thanks for having a look, very much appreciated!
> > 
> > On Mon, Apr 22, 2024 at 11:27:11AM +0200, David Hildenbrand wrote:
> > > On 19.04.24 20:25, David Hildenbrand wrote:
> > > > On 06.04.24 19:36, Vincent Donnefort wrote:
> > > > > In preparation for allowing the user-space to map a ring-buffer, add
> > > > > a set of mapping functions:
> > > > > 
> > > > >  ring_buffer_{map,unmap}()
> > > > > 
> > > > > And controls on the ring-buffer:
> > > > > 
> > > > >  ring_buffer_map_get_reader()  /* swap reader and head */
> > > > > 
> > > > > Mapping the ring-buffer also involves:
> > > > > 
> > > > >  A unique ID for each subbuf of the ring-buffer, currently they 
> > > > > are
> > > > >  only identified through their in-kernel VA.
> > > > > 
> > > > >  A meta-page, where are stored ring-buffer statistics and a
> > > > >  description for the current reader
> > > > > 
> > > > > The linear mapping exposes the meta-page, and each subbuf of the
> > > > > ring-buffer, ordered following their unique ID, assigned during the
> > > > > first mapping.
> > > > > 
> > > > > Once mapped, no subbuf can get in or out of the ring-buffer: the 
> > > > > buffer
> > > > > size will remain unmodified and the splice enabling functions will in
> > > > > reality simply memcpy the data instead of swapping subbufs.
> > > > > 
> > > > > CC: 
> > > > > Signed-off-by: Vincent Donnefort 
> > > > > 
> > > > > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > > > > index dc5ae4e96aee..96d2140b471e 100644
> > > > > --- a/include/linux/ring_buffer.h
> > > > > +++ b/include/linux/ring_buffer.h
> > > > > @@ -6,6 +6,8 @@
> > > > > #include 
> > > > > #include 
> > > > > +#include 
> > > > > +
> > > > > struct trace_buffer;
> > > > > struct ring_buffer_iter;
> > > > > @@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
> > > > > hlist_node *node);
> > > > > #define trace_rb_cpu_prepare  NULL
> > > > > #endif
> > > > > +int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> > > > > + struct vm_area_struct *vma);
> > > > > +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
> > > > > +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
> > > > > #endif /* _LINUX_RING_BUFFER_H */
> > > > > diff --git a/include/uapi/linux/trace_mmap.h 
> > > > > b/include/uapi/linux/trace_mmap.h
> > > > > new file mode 100644
> > > > > index ..ffcd8dfcaa4f
> > > > > --- /dev/null
> > > > > +++ b/include/uapi/linux/trace_mmap.h
> > > > > @@ -0,0 +1,46 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > > > +#ifndef _TRACE_MMAP_H_
> > > > > +#define _TRACE_MMAP_H_
> > > > > +
> > > > > +#include 
> > > > > +
> > > > > +/**
> > > > > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > > > > + * @meta_page_size:  Size of this meta-page.
> > > > > + * @meta_struct_len: Size of this structure.
> > > > > + * @subbuf_size: Size of each sub-buffer.
> > > > > + * @nr_subbufs:  Number of subbfs in the ring-buffer, 
> > > > > including the reader.
> > > > > + * @reader.lost_events:  Number of events lost at the time of 
> > > > > the reader swap.
> > > > > + * @reader.id:   subbuf ID of the current reader. ID 
> > > > > range [0 : @nr_subbufs - 1]
> > > > > + * @reader.read: Number of bytes read on the reader subbuf.
> > > > > + * @flags:   Placeholder for now, 0 until new features are 
> > > > > supported.
> > > > > + * @entries: Number of entries in the ring-buffer.
> > > > > + * @overrun: Number of entries lost in the ring-buffer.
> > > > > + * @read:Number of entries that have been read.
> > > > > + * @Reserved1:   Reserved for future use.
> > > > > + * @Reserved2:   Reserved for future use.
> > > > > + */
> > > > > +struct trace_buffer_meta {
> > > > > + __u32   meta_page_size;
> > > > > + __u32   meta_struct_len;
> > > > > +
> > > > > + __u32   subbuf_size;
> > > > > + __u32   nr_subbufs;
> > > > > +
> > > > > + struct {
> > > > > + __u64   lost_events;
> > > > > + __u32   id;
> > > > > + __u32   read;
> > > > > + } reader;
> > > > > +
> > > > > + __u64   flags;
> > > > > +
> > > > > + __u64   entries;
> > > > > + __u64   overrun;
> > > > > + __u64   read;
> > > > > +
> > > > > + __u64   Reserved1;
> > > > > + __u64   Reserved2;
> > > > > +};
> > > > > +
> > > > > +#endif /* _TRACE_MMAP_H_ */
> > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > > > index cc9ebe593571..793ecc454039 100644
> > > > > --- a/kernel/trace/ring_buffer.c
> > > > > +++ b/kernel/trace/ring_buffer.c
> > > > > @@ -9,6 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-22 Thread David Hildenbrand

On 22.04.24 20:20, Vincent Donnefort wrote:

Hi David,

Thanks for having a look, very much appreciated!

On Mon, Apr 22, 2024 at 11:27:11AM +0200, David Hildenbrand wrote:

On 19.04.24 20:25, David Hildenbrand wrote:

On 06.04.24 19:36, Vincent Donnefort wrote:

In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

 ring_buffer_{map,unmap}()

And controls on the ring-buffer:

 ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

 A unique ID for each subbuf of the ring-buffer, currently they are
 only identified through their in-kernel VA.

 A meta-page, where are stored ring-buffer statistics and a
 description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dc5ae4e96aee..96d2140b471e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
#include 
#include 
+#include 
+
struct trace_buffer;
struct ring_buffer_iter;
@@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
hlist_node *node);
#define trace_rb_cpu_prepareNULL
#endif
+int ring_buffer_map(struct trace_buffer *buffer, int cpu,
+   struct vm_area_struct *vma);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
#endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..ffcd8dfcaa4f
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cc9ebe593571..793ecc454039 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
#include 
#include 
#include 
+#include 
#include 
#include 
#include 
@@ -26,6 +27,7 @@
#include 
#include 
#include 
+#include 
#include 
#include 
@@ -338,6 +340,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
};
@@ -484,6 +487,12 @@ struct ring_buffer_per_cpu {
u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf VA */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
struct list_head

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-22 Thread Vincent Donnefort
Hi David,

Thanks for having a look, very much appreciated!

On Mon, Apr 22, 2024 at 11:27:11AM +0200, David Hildenbrand wrote:
> On 19.04.24 20:25, David Hildenbrand wrote:
> > On 06.04.24 19:36, Vincent Donnefort wrote:
> > > In preparation for allowing the user-space to map a ring-buffer, add
> > > a set of mapping functions:
> > > 
> > > ring_buffer_{map,unmap}()
> > > 
> > > And controls on the ring-buffer:
> > > 
> > > ring_buffer_map_get_reader()  /* swap reader and head */
> > > 
> > > Mapping the ring-buffer also involves:
> > > 
> > > A unique ID for each subbuf of the ring-buffer, currently they are
> > > only identified through their in-kernel VA.
> > > 
> > > A meta-page, where are stored ring-buffer statistics and a
> > > description for the current reader
> > > 
> > > The linear mapping exposes the meta-page, and each subbuf of the
> > > ring-buffer, ordered following their unique ID, assigned during the
> > > first mapping.
> > > 
> > > Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
> > > size will remain unmodified and the splice enabling functions will in
> > > reality simply memcpy the data instead of swapping subbufs.
> > > 
> > > CC: 
> > > Signed-off-by: Vincent Donnefort 
> > > 
> > > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > > index dc5ae4e96aee..96d2140b471e 100644
> > > --- a/include/linux/ring_buffer.h
> > > +++ b/include/linux/ring_buffer.h
> > > @@ -6,6 +6,8 @@
> > >#include 
> > >#include 
> > > +#include 
> > > +
> > >struct trace_buffer;
> > >struct ring_buffer_iter;
> > > @@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
> > > hlist_node *node);
> > >#define trace_rb_cpu_prepare   NULL
> > >#endif
> > > +int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> > > + struct vm_area_struct *vma);
> > > +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
> > > +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
> > >#endif /* _LINUX_RING_BUFFER_H */
> > > diff --git a/include/uapi/linux/trace_mmap.h 
> > > b/include/uapi/linux/trace_mmap.h
> > > new file mode 100644
> > > index ..ffcd8dfcaa4f
> > > --- /dev/null
> > > +++ b/include/uapi/linux/trace_mmap.h
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > +#ifndef _TRACE_MMAP_H_
> > > +#define _TRACE_MMAP_H_
> > > +
> > > +#include 
> > > +
> > > +/**
> > > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > > + * @meta_page_size:  Size of this meta-page.
> > > + * @meta_struct_len: Size of this structure.
> > > + * @subbuf_size: Size of each sub-buffer.
> > > + * @nr_subbufs:  Number of subbfs in the ring-buffer, including 
> > > the reader.
> > > + * @reader.lost_events:  Number of events lost at the time of the reader 
> > > swap.
> > > + * @reader.id:   subbuf ID of the current reader. ID range [0 : 
> > > @nr_subbufs - 1]
> > > + * @reader.read: Number of bytes read on the reader subbuf.
> > > + * @flags:   Placeholder for now, 0 until new features are 
> > > supported.
> > > + * @entries: Number of entries in the ring-buffer.
> > > + * @overrun: Number of entries lost in the ring-buffer.
> > > + * @read:Number of entries that have been read.
> > > + * @Reserved1:   Reserved for future use.
> > > + * @Reserved2:   Reserved for future use.
> > > + */
> > > +struct trace_buffer_meta {
> > > + __u32   meta_page_size;
> > > + __u32   meta_struct_len;
> > > +
> > > + __u32   subbuf_size;
> > > + __u32   nr_subbufs;
> > > +
> > > + struct {
> > > + __u64   lost_events;
> > > + __u32   id;
> > > + __u32   read;
> > > + } reader;
> > > +
> > > + __u64   flags;
> > > +
> > > + __u64   entries;
> > > + __u64   overrun;
> > > + __u64   read;
> > > +
> > > + __u64   Reserved1;
> > > + __u64   Reserved2;
> > > +};
> > > +
> > > +#endif /* _TRACE_MMAP_H_ */
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index cc9ebe593571..793ecc454039 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c
> > > @@ -9,6 +9,7 @@
> > >#include 
> > >#include 
> > >#include 
> > > +#include 
> > >#include 
> > >#include 
> > >#include 
> > > @@ -26,6 +27,7 @@
> > >#include 
> > >#include 
> > >#include 
> > > +#include 
> > >#include 
> > >#include 
> > > @@ -338,6 +340,7 @@ struct buffer_page {
> > >   local_t  entries;   /* entries on this page */
> > >   unsigned longreal_end;  /* real end of data */
> > >   unsigned order; /* order of the page */
> > > + u32  id;/* ID for external mapping */
> > >   struct buffer_data_page *page;  /* Actual data page */
> > >

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-22 Thread Vincent Donnefort
On Thu, Apr 18, 2024 at 11:43:46PM -0400, Steven Rostedt wrote:
> On Thu, 18 Apr 2024 09:55:55 +0300
> Mike Rapoport  wrote:
> 
> Hi Mike,
> 
> Thanks for doing this review!
> 
> > > +/**
> > > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > > + * @meta_page_size:  Size of this meta-page.
> > > + * @meta_struct_len: Size of this structure.
> > > + * @subbuf_size: Size of each sub-buffer.
> > > + * @nr_subbufs:  Number of subbfs in the ring-buffer, including 
> > > the reader.
> > > + * @reader.lost_events:  Number of events lost at the time of the reader 
> > > swap.
> > > + * @reader.id:   subbuf ID of the current reader. ID range [0 : 
> > > @nr_subbufs - 1]
> > > + * @reader.read: Number of bytes read on the reader subbuf.
> > > + * @flags:   Placeholder for now, 0 until new features are 
> > > supported.
> > > + * @entries: Number of entries in the ring-buffer.
> > > + * @overrun: Number of entries lost in the ring-buffer.
> > > + * @read:Number of entries that have been read.
> > > + * @Reserved1:   Reserved for future use.
> > > + * @Reserved2:   Reserved for future use.
> > > + */
> > > +struct trace_buffer_meta {
> > > + __u32   meta_page_size;
> > > + __u32   meta_struct_len;
> > > +
> > > + __u32   subbuf_size;
> > > + __u32   nr_subbufs;
> > > +
> > > + struct {
> > > + __u64   lost_events;
> > > + __u32   id;
> > > + __u32   read;
> > > + } reader;
> > > +
> > > + __u64   flags;
> > > +
> > > + __u64   entries;
> > > + __u64   overrun;
> > > + __u64   read;
> > > +
> > > + __u64   Reserved1;
> > > + __u64   Reserved2;  
> > 
> > Why do you need reserved fields? This structure always resides in the
> > beginning of a page and the rest of the page is essentially "reserved".
> 
> So this code is also going to be used in arm's pkvm hypervisor code,
> where it will be using these fields, but since we are looking at
> keeping the same interface between the two, we don't want these used by
> this interface.
> 
> We probably should add a comment about that.
> 
> > 
> > > +};
> > > +
> > > +#endif /* _TRACE_MMAP_H_ */
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index cc9ebe593571..793ecc454039 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c  
> > 
> > ... 
> > 
> > > +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu 
> > > *cpu_buffer,
> > > +unsigned long *subbuf_ids)
> > > +{
> > > + struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > > + unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
> > > + struct buffer_page *first_subbuf, *subbuf;
> > > + int id = 0;
> > > +
> > > + subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
> > > + cpu_buffer->reader_page->id = id++;
> > > +
> > > + first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
> > > + do {
> > > + if (WARN_ON(id >= nr_subbufs))
> > > + break;
> > > +
> > > + subbuf_ids[id] = (unsigned long)subbuf->page;
> > > + subbuf->id = id;
> > > +
> > > + rb_inc_page();
> > > + id++;
> > > + } while (subbuf != first_subbuf);
> > > +
> > > + /* install subbuf ID to kern VA translation */
> > > + cpu_buffer->subbuf_ids = subbuf_ids;
> > > +
> > > + /* __rb_map_vma() pads the meta-page to align it with the sub-buffers */
> > > + meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order;  
> > 
> > Isn't this a single page?
> 
> One thing we are doing is to make sure that the subbuffers are aligned
> by their size. If a subbuffer is 3 pages, it should be aligned on 3
> page boundaries. This was something that Linus suggested.
> 
> > 
> > > + meta->meta_struct_len = sizeof(*meta);
> > > + meta->nr_subbufs = nr_subbufs;
> > > + meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
> > > +
> > > + rb_update_meta_page(cpu_buffer);
> > > +}  
> > 
> > ...
> > 
> > > +#define subbuf_page(off, start) \
> > > + virt_to_page((void *)((start) + ((off) << PAGE_SHIFT)))
> > > +
> > > +#define foreach_subbuf_page(sub_order, start, page)  \  
> > 
> > Nit: usually iterators in kernel use for_each
> 
> Ah, good catch. Yeah, that should be changed. But then ...
> 
> > 
> > > + page = subbuf_page(0, (start)); \
> > > + for (int __off = 0; __off < (1 << (sub_order)); \
> > > +  __off++, page = subbuf_page(__off, (start)))  
> > 
> > The pages are allocated with alloc_pages_node(.. subbuf_order) are
> > physically contiguous and struct pages for them are also contiguous, so
> > inside a subbuf_order allocation you can just do page++.
> > 
> 
> I'm wondering if we should just nuke the macro. It was there because of
> the previous implementation did things twice. But now it's just done
> once here:
> 
> + while (s < nr_subbufs && p < nr_pages) 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-22 Thread David Hildenbrand

On 19.04.24 20:25, David Hildenbrand wrote:

On 06.04.24 19:36, Vincent Donnefort wrote:

In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

ring_buffer_{map,unmap}()

And controls on the ring-buffer:

ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

A unique ID for each subbuf of the ring-buffer, currently they are
only identified through their in-kernel VA.

A meta-page, where are stored ring-buffer statistics and a
description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dc5ae4e96aee..96d2140b471e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
   #include 
   #include 
   
+#include 

+
   struct trace_buffer;
   struct ring_buffer_iter;
   
@@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);

   #define trace_rb_cpu_prepare NULL
   #endif
   
+int ring_buffer_map(struct trace_buffer *buffer, int cpu,

+   struct vm_area_struct *vma);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
   #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..ffcd8dfcaa4f
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cc9ebe593571..793ecc454039 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -26,6 +27,7 @@
   #include 
   #include 
   #include 
+#include 
   
   #include 

   #include 
@@ -338,6 +340,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
   };
   
@@ -484,6 +487,12 @@ struct ring_buffer_per_cpu {

u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf VA */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
struct list_headnew_pages; /* new pages to add */
@@ -1599,6 +1608,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
init_irq_work(_buffer->irq_work.work, 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-19 Thread David Hildenbrand

On 06.04.24 19:36, Vincent Donnefort wrote:

In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:

   ring_buffer_{map,unmap}()

And controls on the ring-buffer:

   ring_buffer_map_get_reader()  /* swap reader and head */

Mapping the ring-buffer also involves:

   A unique ID for each subbuf of the ring-buffer, currently they are
   only identified through their in-kernel VA.

   A meta-page, where are stored ring-buffer statistics and a
   description for the current reader

The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.

Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.

CC: 
Signed-off-by: Vincent Donnefort 

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dc5ae4e96aee..96d2140b471e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -6,6 +6,8 @@
  #include 
  #include 
  
+#include 

+
  struct trace_buffer;
  struct ring_buffer_iter;
  
@@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);

  #define trace_rb_cpu_prepare  NULL
  #endif
  
+int ring_buffer_map(struct trace_buffer *buffer, int cpu,

+   struct vm_area_struct *vma);
+int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
+int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
  #endif /* _LINUX_RING_BUFFER_H */
diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
new file mode 100644
index ..ffcd8dfcaa4f
--- /dev/null
+++ b/include/uapi/linux/trace_mmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _TRACE_MMAP_H_
+#define _TRACE_MMAP_H_
+
+#include 
+
+/**
+ * struct trace_buffer_meta - Ring-buffer Meta-page description
+ * @meta_page_size:Size of this meta-page.
+ * @meta_struct_len:   Size of this structure.
+ * @subbuf_size:   Size of each sub-buffer.
+ * @nr_subbufs:Number of subbfs in the ring-buffer, including 
the reader.
+ * @reader.lost_events:Number of events lost at the time of the reader 
swap.
+ * @reader.id: subbuf ID of the current reader. ID range [0 : 
@nr_subbufs - 1]
+ * @reader.read:   Number of bytes read on the reader subbuf.
+ * @flags: Placeholder for now, 0 until new features are supported.
+ * @entries:   Number of entries in the ring-buffer.
+ * @overrun:   Number of entries lost in the ring-buffer.
+ * @read:  Number of entries that have been read.
+ * @Reserved1: Reserved for future use.
+ * @Reserved2: Reserved for future use.
+ */
+struct trace_buffer_meta {
+   __u32   meta_page_size;
+   __u32   meta_struct_len;
+
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+
+   struct {
+   __u64   lost_events;
+   __u32   id;
+   __u32   read;
+   } reader;
+
+   __u64   flags;
+
+   __u64   entries;
+   __u64   overrun;
+   __u64   read;
+
+   __u64   Reserved1;
+   __u64   Reserved2;
+};
+
+#endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cc9ebe593571..793ecc454039 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -26,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -338,6 +340,7 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
+   u32  id;/* ID for external mapping */
struct buffer_data_page *page;  /* Actual data page */
  };
  
@@ -484,6 +487,12 @@ struct ring_buffer_per_cpu {

u64 read_stamp;
/* pages removed since last reset */
unsigned long   pages_removed;
+
+   unsigned intmapped;
+   struct mutexmapping_lock;
+   unsigned long   *subbuf_ids;/* ID to subbuf VA */
+   struct trace_buffer_meta*meta_page;
+
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
struct list_headnew_pages; /* new pages to add */
@@ -1599,6 +1608,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long 
nr_pages, int cpu)
init_irq_work(_buffer->irq_work.work, rb_wake_up_waiters);
init_waitqueue_head(_buffer->irq_work.waiters);
 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-18 Thread Steven Rostedt
On Thu, 18 Apr 2024 09:55:55 +0300
Mike Rapoport  wrote:

Hi Mike,

Thanks for doing this review!

> > +/**
> > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > + * @meta_page_size:Size of this meta-page.
> > + * @meta_struct_len:   Size of this structure.
> > + * @subbuf_size:   Size of each sub-buffer.
> > + * @nr_subbufs:Number of subbfs in the ring-buffer, including 
> > the reader.
> > + * @reader.lost_events:Number of events lost at the time of the reader 
> > swap.
> > + * @reader.id: subbuf ID of the current reader. ID range [0 : 
> > @nr_subbufs - 1]
> > + * @reader.read:   Number of bytes read on the reader subbuf.
> > + * @flags: Placeholder for now, 0 until new features are supported.
> > + * @entries:   Number of entries in the ring-buffer.
> > + * @overrun:   Number of entries lost in the ring-buffer.
> > + * @read:  Number of entries that have been read.
> > + * @Reserved1: Reserved for future use.
> > + * @Reserved2: Reserved for future use.
> > + */
> > +struct trace_buffer_meta {
> > +   __u32   meta_page_size;
> > +   __u32   meta_struct_len;
> > +
> > +   __u32   subbuf_size;
> > +   __u32   nr_subbufs;
> > +
> > +   struct {
> > +   __u64   lost_events;
> > +   __u32   id;
> > +   __u32   read;
> > +   } reader;
> > +
> > +   __u64   flags;
> > +
> > +   __u64   entries;
> > +   __u64   overrun;
> > +   __u64   read;
> > +
> > +   __u64   Reserved1;
> > +   __u64   Reserved2;  
> 
> Why do you need reserved fields? This structure always resides in the
> beginning of a page and the rest of the page is essentially "reserved".

So this code is also going to be used in arm's pkvm hypervisor code,
where it will be using these fields, but since we are looking at
keeping the same interface between the two, we don't want these used by
this interface.

We probably should add a comment about that.

> 
> > +};
> > +
> > +#endif /* _TRACE_MMAP_H_ */
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index cc9ebe593571..793ecc454039 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c  
> 
> ... 
> 
> > +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
> > +  unsigned long *subbuf_ids)
> > +{
> > +   struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > +   unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
> > +   struct buffer_page *first_subbuf, *subbuf;
> > +   int id = 0;
> > +
> > +   subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
> > +   cpu_buffer->reader_page->id = id++;
> > +
> > +   first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
> > +   do {
> > +   if (WARN_ON(id >= nr_subbufs))
> > +   break;
> > +
> > +   subbuf_ids[id] = (unsigned long)subbuf->page;
> > +   subbuf->id = id;
> > +
> > +   rb_inc_page();
> > +   id++;
> > +   } while (subbuf != first_subbuf);
> > +
> > +   /* install subbuf ID to kern VA translation */
> > +   cpu_buffer->subbuf_ids = subbuf_ids;
> > +
> > +   /* __rb_map_vma() pads the meta-page to align it with the sub-buffers */
> > +   meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order;  
> 
> Isn't this a single page?

One thing we are doing is to make sure that the subbuffers are aligned
by their size. If a subbuffer is 3 pages, it should be aligned on 3
page boundaries. This was something that Linus suggested.

> 
> > +   meta->meta_struct_len = sizeof(*meta);
> > +   meta->nr_subbufs = nr_subbufs;
> > +   meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
> > +
> > +   rb_update_meta_page(cpu_buffer);
> > +}  
> 
> ...
> 
> > +#define subbuf_page(off, start) \
> > +   virt_to_page((void *)((start) + ((off) << PAGE_SHIFT)))
> > +
> > +#define foreach_subbuf_page(sub_order, start, page)\  
> 
> Nit: usually iterators in kernel use for_each

Ah, good catch. Yeah, that should be changed. But then ...

> 
> > +   page = subbuf_page(0, (start)); \
> > +   for (int __off = 0; __off < (1 << (sub_order)); \
> > +__off++, page = subbuf_page(__off, (start)))  
> 
> The pages are allocated with alloc_pages_node(.. subbuf_order) are
> physically contiguous and struct pages for them are also contiguous, so
> inside a subbuf_order allocation you can just do page++.
> 

I'm wondering if we should just nuke the macro. It was there because of
the previous implementation did things twice. But now it's just done
once here:

+   while (s < nr_subbufs && p < nr_pages) {
+   struct page *page;
+
+   foreach_subbuf_page(subbuf_order, cpu_buffer->subbuf_ids[s], 
page) {
+   if (p >= nr_pages)
+   break;
+
+   pages[p++] = page;
+ 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-18 Thread Mike Rapoport
On Sat, Apr 06, 2024 at 06:36:46PM +0100, Vincent Donnefort wrote:
> In preparation for allowing the user-space to map a ring-buffer, add
> a set of mapping functions:
> 
>   ring_buffer_{map,unmap}()
> 
> And controls on the ring-buffer:
> 
>   ring_buffer_map_get_reader()  /* swap reader and head */
> 
> Mapping the ring-buffer also involves:
> 
>   A unique ID for each subbuf of the ring-buffer, currently they are
>   only identified through their in-kernel VA.
> 
>   A meta-page, where are stored ring-buffer statistics and a
>   description for the current reader
> 
> The linear mapping exposes the meta-page, and each subbuf of the
> ring-buffer, ordered following their unique ID, assigned during the
> first mapping.
> 
> Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
> size will remain unmodified and the splice enabling functions will in
> reality simply memcpy the data instead of swapping subbufs.
> 
> CC: 
> Signed-off-by: Vincent Donnefort 
> 
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index dc5ae4e96aee..96d2140b471e 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -6,6 +6,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  struct trace_buffer;
>  struct ring_buffer_iter;
>  
> @@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct 
> hlist_node *node);
>  #define trace_rb_cpu_prepare NULL
>  #endif
>  
> +int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> + struct vm_area_struct *vma);
> +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
> +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
>  #endif /* _LINUX_RING_BUFFER_H */
> diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
> new file mode 100644
> index ..ffcd8dfcaa4f
> --- /dev/null
> +++ b/include/uapi/linux/trace_mmap.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _TRACE_MMAP_H_
> +#define _TRACE_MMAP_H_
> +
> +#include 
> +
> +/**
> + * struct trace_buffer_meta - Ring-buffer Meta-page description
> + * @meta_page_size:  Size of this meta-page.
> + * @meta_struct_len: Size of this structure.
> + * @subbuf_size: Size of each sub-buffer.
> + * @nr_subbufs:  Number of subbfs in the ring-buffer, including 
> the reader.
> + * @reader.lost_events:  Number of events lost at the time of the reader 
> swap.
> + * @reader.id:   subbuf ID of the current reader. ID range [0 : 
> @nr_subbufs - 1]
> + * @reader.read: Number of bytes read on the reader subbuf.
> + * @flags:   Placeholder for now, 0 until new features are supported.
> + * @entries: Number of entries in the ring-buffer.
> + * @overrun: Number of entries lost in the ring-buffer.
> + * @read:Number of entries that have been read.
> + * @Reserved1:   Reserved for future use.
> + * @Reserved2:   Reserved for future use.
> + */
> +struct trace_buffer_meta {
> + __u32   meta_page_size;
> + __u32   meta_struct_len;
> +
> + __u32   subbuf_size;
> + __u32   nr_subbufs;
> +
> + struct {
> + __u64   lost_events;
> + __u32   id;
> + __u32   read;
> + } reader;
> +
> + __u64   flags;
> +
> + __u64   entries;
> + __u64   overrun;
> + __u64   read;
> +
> + __u64   Reserved1;
> + __u64   Reserved2;

Why do you need reserved fields? This structure always resides in the
beginning of a page and the rest of the page is essentially "reserved".

> +};
> +
> +#endif /* _TRACE_MMAP_H_ */
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index cc9ebe593571..793ecc454039 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c

... 

> +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
> +unsigned long *subbuf_ids)
> +{
> + struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> + unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
> + struct buffer_page *first_subbuf, *subbuf;
> + int id = 0;
> +
> + subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
> + cpu_buffer->reader_page->id = id++;
> +
> + first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
> + do {
> + if (WARN_ON(id >= nr_subbufs))
> + break;
> +
> + subbuf_ids[id] = (unsigned long)subbuf->page;
> + subbuf->id = id;
> +
> + rb_inc_page();
> + id++;
> + } while (subbuf != first_subbuf);
> +
> + /* install subbuf ID to kern VA translation */
> + cpu_buffer->subbuf_ids = subbuf_ids;
> +
> + /* __rb_map_vma() pads the meta-page to align it with the sub-buffers */
> + meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order;

Isn't this a 

Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

2024-04-10 Thread Steven Rostedt
On Sat,  6 Apr 2024 18:36:46 +0100
Vincent Donnefort  wrote:

> +int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> + struct vm_area_struct *vma)
> +{
> + struct ring_buffer_per_cpu *cpu_buffer;
> + unsigned long flags, *subbuf_ids;
> + int err = 0;
> +
> + if (!cpumask_test_cpu(cpu, buffer->cpumask))
> + return -EINVAL;
> +
> + cpu_buffer = buffer->buffers[cpu];
> +
> + mutex_lock(_buffer->mapping_lock);
> +
> + if (cpu_buffer->mapped) {
> + err = __rb_map_vma(cpu_buffer, vma);
> + if (!err)
> + err = __rb_inc_dec_mapped(cpu_buffer, true);
> + mutex_unlock(_buffer->mapping_lock);
> + return err;
> + }
> +
> + /* prevent another thread from changing buffer/sub-buffer sizes */
> + mutex_lock(>mutex);
> +
> + err = rb_alloc_meta_page(cpu_buffer);
> + if (err)
> + goto unlock;
> +
> + /* subbuf_ids include the reader while nr_pages does not */
> + subbuf_ids = kcalloc(cpu_buffer->nr_pages + 1, sizeof(*subbuf_ids), 
> GFP_KERNEL);
> + if (!subbuf_ids) {
> + rb_free_meta_page(cpu_buffer);
> + err = -ENOMEM;
> + goto unlock;
> + }
> +
> + atomic_inc(_buffer->resize_disabled);
> +
> + /*
> +  * Lock all readers to block any subbuf swap until the subbuf IDs are
> +  * assigned.
> +  */
> + raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> + rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
> + raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
> +
> + err = __rb_map_vma(cpu_buffer, vma);
> + if (!err) {
> + raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> + cpu_buffer->mapped = 1;
> + raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
> + } else {
> + kfree(cpu_buffer->subbuf_ids);
> + cpu_buffer->subbuf_ids = NULL;
> + rb_free_meta_page(cpu_buffer);
> + }
> +unlock:

Nit: For all labels, please add a space before them. Otherwise, diffs will
show "unlock" as the function and not "ring_buffer_map", making it harder
to find where the change is.

Same for the labels below.

-- Steve


> + mutex_unlock(>mutex);
> + mutex_unlock(_buffer->mapping_lock);
> +
> + return err;
> +}
> +
> +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
> +{
> + struct ring_buffer_per_cpu *cpu_buffer;
> + unsigned long flags;
> + int err = 0;
> +
> + if (!cpumask_test_cpu(cpu, buffer->cpumask))
> + return -EINVAL;
> +
> + cpu_buffer = buffer->buffers[cpu];
> +
> + mutex_lock(_buffer->mapping_lock);
> +
> + if (!cpu_buffer->mapped) {
> + err = -ENODEV;
> + goto out;
> + } else if (cpu_buffer->mapped > 1) {
> + __rb_inc_dec_mapped(cpu_buffer, false);
> + goto out;
> + }
> +
> + mutex_lock(>mutex);
> + raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> +
> + cpu_buffer->mapped = 0;
> +
> + raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
> +
> + kfree(cpu_buffer->subbuf_ids);
> + cpu_buffer->subbuf_ids = NULL;
> + rb_free_meta_page(cpu_buffer);
> + atomic_dec(_buffer->resize_disabled);
> +
> + mutex_unlock(>mutex);
> +out:
> + mutex_unlock(_buffer->mapping_lock);
> +
> + return err;
> +}
> +
> +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
> +{
> + struct ring_buffer_per_cpu *cpu_buffer;
> + unsigned long reader_size;
> + unsigned long flags;
> +
> + cpu_buffer = rb_get_mapped_buffer(buffer, cpu);
> + if (IS_ERR(cpu_buffer))
> + return (int)PTR_ERR(cpu_buffer);
> +
> + raw_spin_lock_irqsave(_buffer->reader_lock, flags);
> +consume:
> + if (rb_per_cpu_empty(cpu_buffer))
> + goto out;
> +
> + reader_size = rb_page_size(cpu_buffer->reader_page);
> +
> + /*
> +  * There are data to be read on the current reader page, we can
> +  * return to the caller. But before that, we assume the latter will read
> +  * everything. Let's update the kernel reader accordingly.
> +  */
> + if (cpu_buffer->reader_page->read < reader_size) {
> + while (cpu_buffer->reader_page->read < reader_size)
> + rb_advance_reader(cpu_buffer);
> + goto out;
> + }
> +
> + if (WARN_ON(!rb_get_reader_page(cpu_buffer)))
> + goto out;
> +
> + goto consume;
> +out:
> + /* Some archs do not have data cache coherency between kernel and 
> user-space */
> + flush_dcache_folio(virt_to_folio(cpu_buffer->reader_page->page));
> +
> + rb_update_meta_page(cpu_buffer);
> +
> + raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
> + rb_put_mapped_buffer(cpu_buffer);
> +
> + return 0;
> +}
> +
>  /*
>   * We only allocate new buffers, never free them if the CPU goes down.
>   *