Re: [PATCH 2/2] blktrace: limit allowed total trace buffer size

2021-03-30 Thread Christoph Hellwig
On Tue, Mar 23, 2021 at 04:14:40PM +0800, Ming Lei wrote:
> On some ARCHs, such as aarch64, page size may be 64K, meantime there may

Which we call arm64..

> be lots of CPU cores. relay_open() needs to allocate pages on each CPU
> blktrace, so easily too many pages are taken by blktrace. For example,
> on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got
> allocated 7GB in case of 'blktrace -b 8192' which is used by device-mapper
> test suite[1]. This way could cause OOM easily.
> 
> Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages().

Doesn't this break the blktrace ABI by using different buffer size
and numbers than the user asked for?  I think we can enforce an
upper limit and error out, but silently adjusting seems wrong.

Wouldn't it make more sense to fix userspace to not request so many
and so big buffers instead?


Re: [PATCH 2/2] blktrace: limit allowed total trace buffer size

2021-03-29 Thread Ming Lei
On Tue, Mar 30, 2021 at 10:57:04AM +0800, Su Yue wrote:
> 
> On Tue 23 Mar 2021 at 16:14, Ming Lei  wrote:
> 
> > On some ARCHs, such as aarch64, page size may be 64K, meantime there may
> > be lots of CPU cores. relay_open() needs to allocate pages on each CPU
> > blktrace, so easily too many pages are taken by blktrace. For example,
> > on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got
> > allocated 7GB in case of 'blktrace -b 8192' which is used by
> > device-mapper
> > test suite[1]. This way could cause OOM easily.
> > 
> > Fix the issue by limiting max allowed pages to be 1/8 of
> > totalram_pages().
> > 
> > [1] https://github.com/jthornber/device-mapper-test-suite.git
> > 
> > Signed-off-by: Ming Lei 
> > ---
> >  kernel/trace/blktrace.c | 32 
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> > index c221e4c3f625..8403ff19d533 100644
> > --- a/kernel/trace/blktrace.c
> > +++ b/kernel/trace/blktrace.c
> > @@ -466,6 +466,35 @@ static void blk_trace_setup_lba(struct blk_trace
> > *bt,
> > }
> >  }
> > 
> > +/* limit total allocated buffer size is <= 1/8 of total pages */
> > +static void validate_and_adjust_buf(struct blk_user_trace_setup *buts)
> > +{
> > +   unsigned buf_size = buts->buf_size;
> > +   unsigned buf_nr = buts->buf_nr;
> > +   unsigned long max_allowed_pages = totalram_pages() >> 3;
> > +   unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> PAGE_SHIFT;
> > +
> > +   if (req_pages * num_online_cpus() <= max_allowed_pages)
> > +   return;
> > +
> > +   req_pages = DIV_ROUND_UP(max_allowed_pages, num_online_cpus());
> > +
> > +   if (req_pages == 0) {
> > +   buf_size = PAGE_SIZE;
> > +   buf_nr = 1;
> > +   } else {
> > +   buf_size = req_pages << PAGE_SHIFT / buf_nr;
> > 
> Should it be:
> buf_size = (req_pages << PAGE_SHIFT) / buf_nr;
> ?
> The priority of '<<' is lower than '/', right? :)

Good catch, thanks!

-- 
Ming



Re: [PATCH 2/2] blktrace: limit allowed total trace buffer size

2021-03-29 Thread Su Yue



On Tue 23 Mar 2021 at 16:14, Ming Lei  wrote:

On some ARCHs, such as aarch64, page size may be 64K, meantime 
there may
be lots of CPU cores. relay_open() needs to allocate pages on 
each CPU
blktrace, so easily too many pages are taken by blktrace. For 
example,
on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally 
got
allocated 7GB in case of 'blktrace -b 8192' which is used by 
device-mapper

test suite[1]. This way could cause OOM easily.

Fix the issue by limiting max allowed pages to be 1/8 of 
totalram_pages().


[1] https://github.com/jthornber/device-mapper-test-suite.git

Signed-off-by: Ming Lei 
---
 kernel/trace/blktrace.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index c221e4c3f625..8403ff19d533 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -466,6 +466,35 @@ static void blk_trace_setup_lba(struct 
blk_trace *bt,

}
 }

+/* limit total allocated buffer size is <= 1/8 of total pages 
*/
+static void validate_and_adjust_buf(struct blk_user_trace_setup 
*buts)

+{
+   unsigned buf_size = buts->buf_size;
+   unsigned buf_nr = buts->buf_nr;
+   unsigned long max_allowed_pages = totalram_pages() >> 3;
+	unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> 
PAGE_SHIFT;

+
+   if (req_pages * num_online_cpus() <= max_allowed_pages)
+   return;
+
+	req_pages = DIV_ROUND_UP(max_allowed_pages, 
num_online_cpus());

+
+   if (req_pages == 0) {
+   buf_size = PAGE_SIZE;
+   buf_nr = 1;
+   } else {
+   buf_size = req_pages << PAGE_SHIFT / buf_nr;


Should it be:
buf_size = (req_pages << PAGE_SHIFT) / buf_nr;
?
The priority of '<<' is lower than '/', right? :)

--
Su

+   if (buf_size < PAGE_SIZE)
+   buf_size = PAGE_SIZE;
+   buf_nr = req_pages << PAGE_SHIFT / buf_size;
+   if (buf_nr == 0)
+   buf_nr = 1;
+   }
+
+   buts->buf_size = min_t(unsigned, buf_size, buts->buf_size);
+   buts->buf_nr = min_t(unsigned, buf_nr, buts->buf_nr);
+}
+
 /*
  * Setup everything required to start tracing
  */
@@ -482,6 +511,9 @@ static int do_blk_trace_setup(struct 
request_queue *q, char *name, dev_t dev,

if (!buts->buf_size || !buts->buf_nr)
return -EINVAL;

+   /* make sure not allocate too much for userspace */
+   validate_and_adjust_buf(buts);
+
strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';




[PATCH 2/2] blktrace: limit allowed total trace buffer size

2021-03-23 Thread Ming Lei
On some ARCHs, such as aarch64, page size may be 64K, meantime there may
be lots of CPU cores. relay_open() needs to allocate pages on each CPU
blktrace, so easily too many pages are taken by blktrace. For example,
on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got
allocated 7GB in case of 'blktrace -b 8192' which is used by device-mapper
test suite[1]. This way could cause OOM easily.

Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages().

[1] https://github.com/jthornber/device-mapper-test-suite.git

Signed-off-by: Ming Lei 
---
 kernel/trace/blktrace.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index c221e4c3f625..8403ff19d533 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -466,6 +466,35 @@ static void blk_trace_setup_lba(struct blk_trace *bt,
}
 }
 
+/* limit total allocated buffer size is <= 1/8 of total pages */
+static void validate_and_adjust_buf(struct blk_user_trace_setup *buts)
+{
+   unsigned buf_size = buts->buf_size;
+   unsigned buf_nr = buts->buf_nr;
+   unsigned long max_allowed_pages = totalram_pages() >> 3;
+   unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> PAGE_SHIFT;
+
+   if (req_pages * num_online_cpus() <= max_allowed_pages)
+   return;
+
+   req_pages = DIV_ROUND_UP(max_allowed_pages, num_online_cpus());
+
+   if (req_pages == 0) {
+   buf_size = PAGE_SIZE;
+   buf_nr = 1;
+   } else {
+   buf_size = req_pages << PAGE_SHIFT / buf_nr;
+   if (buf_size < PAGE_SIZE)
+   buf_size = PAGE_SIZE;
+   buf_nr = req_pages << PAGE_SHIFT / buf_size;
+   if (buf_nr == 0)
+   buf_nr = 1;
+   }
+
+   buts->buf_size = min_t(unsigned, buf_size, buts->buf_size);
+   buts->buf_nr = min_t(unsigned, buf_nr, buts->buf_nr);
+}
+
 /*
  * Setup everything required to start tracing
  */
@@ -482,6 +511,9 @@ static int do_blk_trace_setup(struct request_queue *q, char 
*name, dev_t dev,
if (!buts->buf_size || !buts->buf_nr)
return -EINVAL;
 
+   /* make sure not allocate too much for userspace */
+   validate_and_adjust_buf(buts);
+
strncpy(buts->name, name, BLKTRACE_BDEV_SIZE);
buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
 
-- 
2.29.2