Re: lost connection to test machine (4)

Eric Dumazet Tue, 13 Feb 2018 05:36:07 -0800

On Mon, 2018-02-12 at 12:05 -0800, Tejun Heo wrote:
> On Mon, Feb 12, 2018 at 09:03:25AM -0800, Tejun Heo wrote:
> > Hello, Daniel.
> > 
> > On Mon, Feb 12, 2018 at 06:00:13PM +0100, Daniel Borkmann wrote:
> > > [ +Dennis, +Tejun ]
> > > 
> > > Looks like we're stuck in percpu allocator with key/value size of 4 bytes
> > > each and large number of entries (max_entries) in the reproducer in above
> > > link.
> > > 
> > > Could we have some __GFP_NORETRY semantics and let allocations fail 
> > > instead
> > > of triggering OOM killer?
> > 
> > For some part, maybe, but not generally.  The virt area allocation
> > goes down to page table allocation which is hard coded to use
> > GFP_KERNEL in arch mm code.
> 
> So, the following should convert majority of allocations to use
> __GFP_NORETRY.  It doesn't catch everything but should significantly
> lower the probability of hitting this and put this on the same footing
> as vmalloc.  Can you see whether this is enough?
> 
> Note that this patch isn't upstreamable.  We definitely want to
> restrict this to the rebalance path, but it should be good enough for
> testing.
> 
> Thanks.
> 
> diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> index 9158e5a..0b4739f 100644
> --- a/mm/percpu-vm.c
> +++ b/mm/percpu-vm.c
> @@ -81,7 +81,7 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
>  static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
>                           struct page **pages, int page_start, int page_end)
>  {
> -     const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
> +     const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_NORETRY;
>       unsigned int cpu, tcpu;
>       int i;
>


Also I would consider using this fix as I had warnings of cpus being
stuck there for more than 50 ms :


diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 
9158e5a81391ced4e268e3d5dd9879c2bc7280ce..6309b01ceb357be01e857e5f899429403836f41f
 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -92,6 +92,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
                        *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
                        if (!*pagep)
                                goto err;
+                       cond_resched();
                }
        }
        return 0;

Re: lost connection to test machine (4)

Reply via email to