Hi, Vladimir! Thanks for a quick response.
I created JIRA issue and uploaded the dumps. All the information is included into JIRA issue: https://bugs.openvz.org/browse/OVZ-6756 On Wed, Jun 15, 2016 at 11:47 AM, Vladimir Davydov <[email protected]> wrote: > Hi, > > Thanks for the report. > > Could you please > > - file a bug to bugzilla.openvz.org > > - upload the vmcore at > rsync://fe.sw.ru/f837d67c8e2ade8cee3367cb0f880268/ > > On Mon, Jun 13, 2016 at 09:24:33AM +0300, Anatoly Stepanov wrote: >> Hello everyone! >> >> We encounter an issue with mem_cgroup_uncharge_page() function, >> it appears quite often on our clients servers. >> >> Basically the issue sometimes leads to hard-lockup, sometimes to GP fault. >> >> Based on bug reports from clients, the problem shows up when a user >> process calls "execve" or "exit" syscalls. >> As we know in those cases kernel invokes "uncharging" for every page >> when its unmapped from all the mm's. >> >> Kernel dump analysis shows that at the moment of >> mem_cgroup_uncharge_page() "memcg" pointer >> (taken from page_cgroup) seems to be pointing to some random memory area. >> >> On the other hand, if we look at current->mm->css, then memcg instance >> exists and is "online". >> >> This led me to a thought that "page_cgroup->memcg" may be changed by >> some part of memcg code in parallel. >> As far as i understand, the only option here is "reclaim code path" >> (may be i'm wrong) >> >> So, i suppose there might be a race between "memcg uncharge code" and >> "memcg reclaim code". >> >> Please, give me your thoughts about it >> thanks >> >> P.S.: >> >> Additional info: >> >> Kernel: rh7-3.10.0-327.10.1.vz7.12.14 >> >> *************************************************1st >> BT************************************************ >> >> PID: 972445 TASK: ffff88065d53d8d0 CPU: 0 COMMAND: "httpd" >> #0 [ffff880224f37818] machine_kexec at ffffffff8105249b >> #1 [ffff880224f37878] crash_kexec at ffffffff81103532 >> #2 [ffff880224f37948] oops_end at ffffffff81641628 >> #3 [ffff880224f37970] die at ffffffff810184cb >> #4 [ffff880224f379a0] do_general_protection at ffffffff81640f24 >> #5 [ffff880224f379d0] general_protection at ffffffff81640768 >> [exception RIP: mem_cgroup_charge_statistics+19] >> RIP: ffffffff811e7733 RSP: ffff880224f37a80 RFLAGS: 00010202 >> RAX: ffffffffffffffff RBX: ffff8807b26f0110 RCX: 00000000ffffffff >> RDX: 79726f6765746163 RSI: ffffea000c9c0440 RDI: ffff8806a55662f8 >> RBP: ffff880224f37a80 R8: 0000000000000000 R9: 0000000003808000 >> R10: 00000000000000b8 R11: ffffea001eaa8980 R12: ffffea000c9c0440 >> R13: 0000000000000001 R14: 0000000000000000 R15: ffff8806a5566000 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> #6 [ffff880224f37a88] __mem_cgroup_uncharge_common at ffffffff811e9ddf >> #7 [ffff880224f37ac8] mem_cgroup_uncharge_page at ffffffff811ee99a >> #8 [ffff880224f37ad8] page_remove_rmap at ffffffff811b9ec9 >> #9 [ffff880224f37b10] unmap_page_range at ffffffff811ab580 >> #10 [ffff880224f37bf8] unmap_single_vma at ffffffff811aba11 >> #11 [ffff880224f37c30] unmap_vmas at ffffffff811ace79 >> #12 [ffff880224f37c68] exit_mmap at ffffffff811b663c >> #13 [ffff880224f37d18] mmput at ffffffff8107853b >> #14 [ffff880224f37d38] flush_old_exec at ffffffff81202547 >> #15 [ffff880224f37d88] load_elf_binary at ffffffff8125883c >> #16 [ffff880224f37e58] search_binary_handler at ffffffff81201c25 >> #17 [ffff880224f37ea0] do_execve_common at ffffffff812032b7 >> #18 [ffff880224f37f30] sys_execve at ffffffff81203619 >> #19 [ffff880224f37f50] stub_execve at ffffffff81649369 >> RIP: 00007f54284b3287 RSP: 00007ffda57a0698 RFLAGS: 00000297 >> RAX: 000000000000003b RBX: 00000000037c5fe8 RCX: ffffffffffffffff >> RDX: 00000000037cf3f8 RSI: 00000000037ce5f8 RDI: 00007f5425fcabf1 >> RBP: 00007ffda57a0750 R8: 0000000000000001 R9: 0000000000000000 >> >> >> ***************************************2nd >> BT**************************************************: >> >> PID: 168440 TASK: ffff88001e31cc20 CPU: 18 COMMAND: "httpd" >> #0 [ffff88007255f838] machine_kexec at ffffffff8105249b >> #1 [ffff88007255f898] crash_kexec at ffffffff81103532 >> #2 [ffff88007255f968] oops_end at ffffffff81641628 >> #3 [ffff88007255f990] no_context at ffffffff8163222b >> #4 [ffff88007255f9e0] __bad_area_nosemaphore at ffffffff816322c1 >> #5 [ffff88007255fa30] bad_area_nosemaphore at ffffffff8163244a >> #6 [ffff88007255fa40] __do_page_fault at ffffffff8164443e >> #7 [ffff88007255faa0] trace_do_page_fault at ffffffff81644673 >> #8 [ffff88007255fad8] do_async_page_fault at ffffffff81643d59 >> #9 [ffff88007255faf0] async_page_fault at ffffffff816407f8 >> [exception RIP: memcg_check_events+435] >> RIP: ffffffff811e9b53 RSP: ffff88007255fba0 RFLAGS: 00010246 >> RAX: 00000000f81ef81e RBX: ffff8802106d5000 RCX: 0000000000000000 >> RDX: 000000000000f81e RSI: 0000000000020000 RDI: ffff8807aa2642e8 >> RBP: ffff88007255fbf0 R8: 0000000000000202 R9: 0000000000000000 >> R10: 0000000000000010 R11: ffff88007255ffd8 R12: ffff8807aa2642e0 >> R13: 0000000000000410 R14: ffff8802073de700 R15: ffff8802106d5000 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> #10 [ffff88007255fbf8] __mem_cgroup_uncharge_common at ffffffff811e9df2 >> #11 [ffff88007255fc38] mem_cgroup_uncharge_page at ffffffff811ee99a >> #12 [ffff88007255fc48] page_remove_rmap at ffffffff811b9ec9 >> #13 [ffff88007255fc80] unmap_page_range at ffffffff811ab580 >> #14 [ffff88007255fd68] unmap_single_vma at ffffffff811aba11 >> #15 [ffff88007255fda0] unmap_vmas at ffffffff811ace79 >> #16 [ffff88007255fdd8] exit_mmap at ffffffff811b663c >> #17 [ffff88007255fe88] mmput at ffffffff8107853b >> #18 [ffff88007255fea8] do_exit at ffffffff81081d8c >> #19 [ffff88007255ff40] do_group_exit at ffffffff8108266f >> #20 [ffff88007255ff70] sys_exit_group at ffffffff810826e4 >> #21 [ffff88007255ff80] system_call_fastpath at ffffffff81648dc9 >> RIP: 00007fc210ea4259 RSP: 00007ffe20580fa8 RFLAGS: 00010206 >> RAX: 00000000000000e7 RBX: ffffffff81648dc9 RCX: 0000000000000000 >> >> *******************************************3rd >> BT**********************************************: >> >> PID: 1003121 TASK: ffff880036b58000 CPU: 1 COMMAND: "httpd" >> #0 [ffff880237a459c8] machine_kexec at ffffffff8105249b >> #1 [ffff880237a45a28] crash_kexec at ffffffff81103532 >> #2 [ffff880237a45af8] panic at ffffffff816329b0 >> #3 [ffff880237a45b78] watchdog_overflow_callback at ffffffff8112cee2 >> #4 [ffff880237a45b88] __perf_event_overflow at ffffffff81171c11 >> #5 [ffff880237a45c00] perf_event_overflow at ffffffff811726e4 >> #6 [ffff880237a45c10] intel_pmu_handle_irq at ffffffff81032e98 >> #7 [ffff880237a45e60] perf_event_nmi_handler at ffffffff8164206b >> #8 [ffff880237a45e80] nmi_handle at ffffffff816417b9 >> #9 [ffff880237a45ec8] do_nmi at ffffffff816418d0 >> #10 [ffff880237a45ef0] end_repeat_nmi at ffffffff81640b93 >> [exception RIP: _raw_spin_lock+58] >> RIP: ffffffff8163ff7a RSP: ffff88003e16fa28 RFLAGS: 00000006 >> RAX: 00000000000048f6 RBX: ffff8803edbab870 RCX: 0000000000006120 >> RDX: 0000000000006362 RSI: 0000000000006362 RDI: ffff8803edbab898 >> RBP: ffff88003e16fa28 R8: 0000000000000000 R9: 0000000002d98000 >> R10: 0000000000002295 R11: ffffea0010d1f080 R12: 0000000000000000 >> R13: ffff8803edbab870 R14: 0000000000000000 R15: ffff8803edbab898 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> --- <NMI exception stack> --- >> #11 [ffff88003e16fa28] _raw_spin_lock at ffffffff8163ff7a >> #12 [ffff88003e16fa30] res_counter_uncharge_until at ffffffff81114df9 >> #13 [ffff88003e16fa78] res_counter_uncharge at ffffffff81114e73 >> #14 [ffff88003e16fa88] __mem_cgroup_uncharge_common at ffffffff811e9e7c >> #15 [ffff88003e16fac8] mem_cgroup_uncharge_page at ffffffff811ee99a >> #16 [ffff88003e16fad8] page_remove_rmap at ffffffff811b9ec9 >> #17 [ffff88003e16fb10] unmap_page_range at ffffffff811ab580 >> #18 [ffff88003e16fbf8] unmap_single_vma at ffffffff811aba11 >> #19 [ffff88003e16fc30] unmap_vmas at ffffffff811ace79 >> #20 [ffff88003e16fc68] exit_mmap at ffffffff811b663c >> #21 [ffff88003e16fd18] mmput at ffffffff8107853b >> #22 [ffff88003e16fd38] flush_old_exec at ffffffff81202547 >> #23 [ffff88003e16fd88] load_elf_binary at ffffffff8125883c >> #24 [ffff88003e16fe58] search_binary_handler at ffffffff81201c25 >> #25 [ffff88003e16fea0] do_execve_common at ffffffff812032b7 >> #26 [ffff88003e16ff30] sys_execve at ffffffff81203619 >> #27 [ffff88003e16ff50] stub_execve at ffffffff81649369 >> RIP: 00007f54e8341287 RSP: 00007fffcd0d22e8 RFLAGS: 00000297 >> RAX: 000000000000003b RBX: 0000000002d8b2a0 RCX: ffffffffffffffff >> RDX: 0000000002d8a810 RSI: 0000000002db4128 RDI: 00007f54e605cbf1 >> RBP: 00007fffcd0d23a0 R8: 0000000000000001 R9: 0000000000000000 >> R10: 00007fffcd0d2050 R11: 0000000000000297 R12: 0000000002d8a810 >> R13: 0000000002db3a50 R14: 0000000002da8440 R15: 0000000000000000 >> ORIG_RAX: 000000000000003b CS: 0033 SS: 002b -- Best regards, Anatoly Stepanov | Kernel Developer Skype: digitolman CloudLinux.com | KernelCare.com | KuberDock.com helpdesk.cloudlinux.com: 24/7 Free, exceptionally good support Follow twitter.com/CloudLinuxOS for technical updates _______________________________________________ Devel mailing list [email protected] https://lists.openvz.org/mailman/listinfo/devel
