On Mon, 2018-02-12 at 12:05 -0800, Tejun Heo wrote:
> On Mon, Feb 12, 2018 at 09:03:25AM -0800, Tejun Heo wrote:
> > Hello, Daniel.
> > On Mon, Feb 12, 2018 at 06:00:13PM +0100, Daniel Borkmann wrote:
> > > [ +Dennis, +Tejun ]
> > >
> > > Looks like we're stuck in percpu allocator with key/value size of 4 bytes
> > > each and large number of entries (max_entries) in the reproducer in above
> > > link.
> > >
> > > Could we have some __GFP_NORETRY semantics and let allocations fail
> > > instead
> > > of triggering OOM killer?
> > For some part, maybe, but not generally. The virt area allocation
> > goes down to page table allocation which is hard coded to use
> > GFP_KERNEL in arch mm code.
> So, the following should convert majority of allocations to use
> __GFP_NORETRY. It doesn't catch everything but should significantly
> lower the probability of hitting this and put this on the same footing
> as vmalloc. Can you see whether this is enough?
> Note that this patch isn't upstreamable. We definitely want to
> restrict this to the rebalance path, but it should be good enough for
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index 9158e5a..0b4739f 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -81,7 +81,7 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
> static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
> struct page **pages, int page_start, int page_end)
> - const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
> + const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_NORETRY;
> unsigned int cpu, tcpu;
> int i;
Also I would consider using this fix as I had warnings of cpus being
stuck there for more than 50 ms :
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
@@ -92,6 +92,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
*pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);