On Mon, Feb 12, 2018 at 09:03:25AM -0800, Tejun Heo wrote:
> Hello, Daniel.
> On Mon, Feb 12, 2018 at 06:00:13PM +0100, Daniel Borkmann wrote:
> > [ +Dennis, +Tejun ]
> > Looks like we're stuck in percpu allocator with key/value size of 4 bytes
> > each and large number of entries (max_entries) in the reproducer in above
> > link.
> > Could we have some __GFP_NORETRY semantics and let allocations fail instead
> > of triggering OOM killer?
> For some part, maybe, but not generally. The virt area allocation
> goes down to page table allocation which is hard coded to use
> GFP_KERNEL in arch mm code.
So, the following should convert majority of allocations to use
__GFP_NORETRY. It doesn't catch everything but should significantly
lower the probability of hitting this and put this on the same footing
as vmalloc. Can you see whether this is enough?
Note that this patch isn't upstreamable. We definitely want to
restrict this to the rebalance path, but it should be good enough for
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9158e5a..0b4739f 100644
@@ -81,7 +81,7 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
struct page **pages, int page_start, int page_end)
- const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
+ const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_NORETRY;
unsigned int cpu, tcpu;