On 31.08.2015 16:59, Konstantin Khorenko wrote: > Maxim, please review. > > Do we need the same in PCS6?
yes, backport to rh6 is required because of https://bugs.openvz.org/browse/OVZ-6293 1294400.800014] INFO: task jbd2/ploop27926:1692 blocked for more than 120 seconds. [1294400.800121] Not tainted 2.6.32-042stab108.1 #1 [1294400.800177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1294400.800285] jbd2/ploop279 D ffff88081950e400 0 1692 2 0 0x00000080 [1294400.800393] ffff88080dde4fe0 0000000000000046 0000000000000000 ffff88060dac1000 [1294400.800502] 0005120000000000 ffff88081b2c4ae0 0004991706322912 0000000000000002 [1294400.800615] ffff88080dde4fb8 0000000000000000 000000014d2b4e18 0000000000000000 [1294400.800724] Call Trace: [1294400.800777] [<ffffffff81536075>] schedule_timeout+0x215/0x2e0 [1294400.800836] [<ffffffff81015059>] ? read_tsc+0x9/0x20 [1294400.800893] [<ffffffff810b4801>] ? ktime_get_ts+0xb1/0xf0 [1294400.800951] [<ffffffff81015059>] ? read_tsc+0x9/0x20 [1294400.801008] [<ffffffff810b4801>] ? ktime_get_ts+0xb1/0xf0 [1294400.801066] [<ffffffff8153460f>] io_schedule_timeout+0x7f/0xd0 [1294400.801125] [<ffffffff815359e4>] wait_for_completion_io+0xe4/0x120 [1294400.801185] [<ffffffff81065c00>] ? default_wake_function+0x0/0x20 [1294400.801246] [<ffffffff81286126>] blkdev_issue_discard+0x216/0x230 [1294400.801305] [<ffffffff81189875>] scan_swap_map+0x385/0x640 [1294400.801362] [<ffffffff81189c6d>] get_swap_page+0x9d/0x140 [1294400.801420] [<ffffffff81186e87>] add_to_swap+0x17/0x90 [1294400.801478] [<ffffffff8115a6ce>] shrink_page_list.clone.0+0x2de/0x900 [1294400.801537] [<ffffffff8115bece>] shrink_inactive_list+0x3be/0xb10 [1294400.801600] [<ffffffff81015059>] ? read_tsc+0x9/0x20 [1294400.801660] [<ffffffff8115ca50>] shrink_lruvec+0x430/0x600 [1294400.801718] [<ffffffff8115cea7>] shrink_zone+0x287/0x3d0 [1294400.801775] [<ffffffff8115ec28>] do_try_to_free_pages+0x588/0xa60 [1294400.801834] [<ffffffff8115f31b>] try_to_free_pages+0x8b/0x120 [1294400.801894] [<ffffffff8114fde1>] __alloc_pages_nodemask+0x671/0xb50 [1294400.801954] [<ffffffff8119ac79>] kmem_getpages+0x59/0x140 [1294400.802011] [<ffffffff8119cdcb>] fallback_alloc+0x1bb/0x260 [1294400.802069] [<ffffffff8119cb59>] ____cache_alloc_node+0x99/0x150 [1294400.802128] [<ffffffff8119db63>] kmem_cache_alloc+0x173/0x1e0 [1294400.802186] [<ffffffff81140de5>] mempool_alloc_slab+0x15/0x20 [1294400.802244] [<ffffffff81140f87>] mempool_alloc+0x67/0x170 [1294400.802302] [<ffffffff8119ac79>] ? kmem_getpages+0x59/0x140 [1294400.802361] [<ffffffff811f494e>] bio_alloc_bioset+0x3e/0xf0 [1294400.802419] [<ffffffff811f4aa5>] bio_alloc+0x15/0x30 [1294400.802478] [<ffffffffa04880e8>] ploop_make_request+0x5a8/0xa30 [ploop] [1294400.802538] [<ffffffff8127d780>] generic_make_request+0x240/0x550 [1294400.802599] [<ffffffff8127db13>] submit_bio+0x83/0x1c0 [1294400.802659] [<ffffffff811f496b>] ? bio_alloc_bioset+0x5b/0xf0 [1294400.802719] [<ffffffff811ee41d>] submit_bh+0x11d/0x1e0 [1294400.802778] [<ffffffffa00a8c18>] jbd2_journal_commit_transaction+0x5a8/0x1500 [jbd2] [1294400.802888] [<ffffffff810097dd>] ? __switch_to+0x13d/0x320 [1294400.802946] [<ffffffff8108f90b>] ? try_to_del_timer_sync+0x7b/0xe0 [1294400.803008] [<ffffffffa00aebb8>] kjournald2+0xb8/0x220 [jbd2] [1294400.803066] [<ffffffff810a8860>] ? autoremove_wake_function+0x0/0x40 [1294400.803127] [<ffffffffa00aeb00>] ? kjournald2+0x0/0x220 [jbd2] [1294400.803185] [<ffffffff810a846e>] kthread+0x9e/0xc0 [1294400.803242] [<ffffffff8100c38a>] child_rip+0xa/0x20 [1294400.803298] [<ffffffff810a83d0>] ? kthread+0x0/0xc0 [1294400.803355] [<ffffffff8100c380>] ? child_rip+0x0/0x20 > -- > Best regards, > > Konstantin Khorenko, > Virtuozzo Linux Kernel Team > > On 08/17/2015 04:30 PM, Vladimir Davydov wrote: >> Currently, we use GFP_NOFS, which may result in a dead lock as follows: >> >> filemap_fault >> do_mpage_readpage >> submit_bio >> generic_make_request initializes current->bio_list >> calls make_request_fn >> ploop_make_request >> bio_alloc(GFP_NOFS) >> kmem_cache_alloc >> memcg_charge_kmem >> try_to_free_mem_cgroup_pages >> swap_writepage >> generic_make_request puts bio on current->bio_list >> try_to-free_mem_cgroup_pages >> wait_on_page_writeback >> >> The wait_on_page_writeback will never complete then, because the >> corresponding bio is on current->bio_list and for it to get to the queue >> we must return from ploop_make_request first. >> >> The stack trace of a hung task: >> >> [<ffffffff8115ae2e>] sleep_on_page+0xe/0x20 >> [<ffffffff8115abb6>] wait_on_page_bit+0x86/0xb0 >> [<ffffffff8116f4b2>] shrink_page_list+0x6e2/0xaf0 >> [<ffffffff8116ff2b>] shrink_inactive_list+0x1cb/0x610 >> [<ffffffff81170ab5>] shrink_lruvec+0x395/0x790 >> [<ffffffff81171031>] shrink_zone+0x181/0x350 >> [<ffffffff811715a0>] do_try_to_free_pages+0x170/0x530 >> [<ffffffff81171b76>] try_to_free_mem_cgroup_pages+0xb6/0x140 >> [<ffffffff811c6b5e>] __mem_cgroup_try_charge+0x1de/0xd70 >> [<ffffffff811c8c4b>] memcg_charge_kmem+0x9b/0x100 >> [<ffffffff811c8e1b>] __memcg_charge_slab+0x3b/0x90 >> [<ffffffff811b3664>] new_slab+0x264/0x3f0 >> [<ffffffff815e97c6>] __slab_alloc+0x315/0x48f >> [<ffffffff811b49ac>] kmem_cache_alloc+0x1cc/0x210 >> [<ffffffff8115e4b5>] mempool_alloc_slab+0x15/0x20 >> [<ffffffff8115e5f9>] mempool_alloc+0x69/0x170 >> [<ffffffff8120bd42>] bvec_alloc+0x92/0x120 >> [<ffffffff8120bfb8>] bio_alloc_bioset+0x1e8/0x2e0 >> [<ffffffffa0072246>] ploop_make_request+0x2a6/0xac0 [ploop] >> [<ffffffff81297172>] generic_make_request+0xe2/0x130 >> [<ffffffff81297237>] submit_bio+0x77/0x1c0 >> [<ffffffff8121341f>] do_mpage_readpage+0x37f/0x6e0 >> [<ffffffff8121386b>] mpage_readpages+0xeb/0x160 >> [<ffffffffa01a051c>] ext4_readpages+0x3c/0x40 [ext4] >> [<ffffffff811683c0>] __do_page_cache_readahead+0x1e0/0x260 >> [<ffffffff81168b11>] ra_submit+0x21/0x30 >> [<ffffffff8115dea1>] filemap_fault+0x321/0x4b0 >> [<ffffffff811864ca>] __do_fault+0x8a/0x560 >> [<ffffffff8118b2d0>] handle_mm_fault+0x3d0/0xd80 >> [<ffffffff815f73ee>] __do_page_fault+0x15e/0x530 >> [<ffffffff815f77da>] do_page_fault+0x1a/0x70 >> [<ffffffff815f3a08>] page_fault+0x28/0x30 >> >> https://jira.sw.ru/browse/PSBM-38842 >> >> Signed-off-by: Vladimir Davydov <vdavy...@parallels.com> >> --- >> drivers/block/ploop/dev.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c >> index 30eb8a7551e5..f37df4dacf8c 100644 >> --- a/drivers/block/ploop/dev.c >> +++ b/drivers/block/ploop/dev.c >> @@ -717,7 +717,7 @@ preallocate_bio(struct bio * orig_bio, struct >> ploop_device * plo) >> } >> >> if (nbio == NULL) >> - nbio = bio_alloc(GFP_NOFS, max(orig_bio->bi_max_vecs, >> block_vecs(plo))); >> + nbio = bio_alloc(GFP_NOIO, max(orig_bio->bi_max_vecs, >> block_vecs(plo))); >> return nbio; >> } >> >> @@ -852,7 +852,7 @@ static void ploop_make_request(struct request_queue *q, >> struct bio *bio) >> >> if (!current->io_context) { >> struct io_context *ioc; >> - ioc = get_task_io_context(current, GFP_NOFS, NUMA_NO_NODE); >> + ioc = get_task_io_context(current, GFP_NOIO, NUMA_NO_NODE); >> if (ioc) >> put_io_context(ioc); >> } >> > _______________________________________________ > Devel mailing list > Devel@openvz.org > https://lists.openvz.org/mailman/listinfo/devel > _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel