This is a backport of upstream (vanilla) commit: commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
Under certain conditions there might be a lot of alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER. For example: httpd which is doing a lot of fork() calls. Real-life examples from our customers: [532506.773243] httpd D ffff8803f5fecc20 0 939874 6606 [532506.773257] Call Trace: [532506.773261] [<ffffffff8163ce29>] schedule+0x29/0x70 [532506.773264] [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0 [532506.773272] [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70 [532506.773276] [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130 [532506.773280] [<ffffffff8119be85>] wait_iff_congested+0x135/0x150 [532506.773284] [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30 [532506.773288] [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0 [532506.773292] [<ffffffff81190f55>] shrink_lruvec+0x395/0x800 [532506.773296] [<ffffffff811914af>] shrink_zone+0xef/0x2d0 [532506.773300] [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530 [532506.773310] [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160 [532506.773315] [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10 [532506.773320] [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170 [532506.773324] [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50 [532506.773327] [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0 [532506.773332] [<ffffffff811d8c69>] __kmalloc+0x259/0x270 [532506.773337] [<ffffffff812184d0>] alloc_fdmem+0x20/0x50 [532506.773341] [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0 [532506.773344] [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0 [532506.773354] [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510 [532506.773358] [<ffffffff8107a641>] do_fork+0xe1/0x320 [532506.773370] [<ffffffff8107a906>] SyS_clone+0x16/0x20 [532506.773376] [<ffffffff81648299>] stub_clone+0x69/0x90 [532506.773380] [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b [513890.005271] httpd D ffff880425db7230 0 811718 6606 [513890.005279] Call Trace: [513890.005282] [<ffffffff8163ce29>] schedule+0x29/0x70 [513890.005284] [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0 [513890.005292] [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130 [513890.005296] [<ffffffff8163c448>] io_schedule+0x18/0x20 [513890.005298] [<ffffffff812c6268>] get_request+0x218/0x780 [513890.005303] [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0 [513890.005309] [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod] [513890.005311] [<ffffffff812c3892>] generic_make_request+0xe2/0x130 [513890.005313] [<ffffffff812c3957>] submit_bio+0x77/0x1c0 [513890.005318] [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260 [513890.005337] [<ffffffff811bf959>] swap_writepage+0x39/0x80 [513890.005340] [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80 [513890.005343] [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0 [513890.005345] [<ffffffff81190f55>] shrink_lruvec+0x395/0x800 [513890.005348] [<ffffffff811914af>] shrink_zone+0xef/0x2d0 [513890.005350] [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530 [513890.005353] [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160 [513890.005355] [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10 [513890.005358] [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170 [513890.005360] [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50 [513890.005362] [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0 [513890.005365] [<ffffffff811d8c69>] __kmalloc+0x259/0x270 [513890.005367] [<ffffffff812184d0>] alloc_fdmem+0x20/0x50 [513890.005369] [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0 [513890.005371] [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0 [513890.005376] [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510 [513890.005378] [<ffffffff8107a641>] do_fork+0xe1/0x320 [513890.005380] [<ffffffff8107a906>] SyS_clone+0x16/0x20 [513890.005382] [<ffffffff81648299>] stub_clone+0x69/0x90 We observed that sometimes kswapd cannot handle this which causes many direct reclaim attempts which in turn: 1. Increases iowait time due to congestion_wait 2. Increases number of block reqs per second due to page swapping and writeback 3. May induce OOMs So it's better DO NOT try that hard to allocate contiguous area, and fallback to vmalloc() as soon as possible. Signed-off-by: Anatoly Stepanov <[email protected]> --- fs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 366d9bb..3f65ba0 100644 --- a/fs/file.c +++ b/fs/file.c @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size) * vmalloc() if the allocation size will be considered "large" by the VM. */ if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { - void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN); + void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY); if (data != NULL) return data; } -- 1.8.3.1 _______________________________________________ Devel mailing list [email protected] https://lists.openvz.org/mailman/listinfo/devel
