On Wed 24-04-13 12:42:55, Heiko Carstens wrote:
> On Thu, Apr 18, 2013 at 09:13:03AM +0200, Heiko Carstens wrote:
> > Ok, thanks for verifying! I'll look into it; hopefully I can reproduce it
> > here as well.
> 
> That seems to be a common code bug. I can easily trigger the VM_BUG_ON()
> below (when I force the system to swap):
> 
> [   48.347963] ------------[ cut here ]------------
> [   48.347972] kernel BUG at mm/memcontrol.c:3994!
> [   48.348012] illegal operation: 0001 [#1] SMP 
> [   48.348015] Modules linked in:
> [   48.348017] CPU: 1 Not tainted 3.9.0-rc8+ #38
> [   48.348020] Process mmap2 (pid: 635, task: 0000000029476100, ksp: 
> 000000002e91b938)
> [   48.348022] Krnl PSW : 0704f00180000000 000000000026552c 
> (__mem_cgroup_uncharge_common+0x2c4/0x33c)
> [   48.348032]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
> EA:3
>                Krnl GPRS: 0000000000000008 0000000000000009 000003d1002a9200 
> 0000000000000000
> [   48.348039]            0000000000000000 00000000006812d8 000003ffdf339000 
> 00000000321a6f98
> [   48.348043]            000003fffce11000 0000000000000000 0000000000000001 
> 000003d1002a9200
> [   48.348046]            0000000000000001 0000000000681b88 000000002e91bc18 
> 000000002e91bbd0
> [   48.348057] Krnl Code: 000000000026551e: c0e5fffaa2a1        brasl   
> %r14,1b9a60
>                           0000000000265524: a7f4ff7d            brc     
> 15,26541e
>                          #0000000000265528: a7f40001            brc     
> 15,26552a
>                          >000000000026552c: e3c0b8200124        stg     
> %r12,6176(%r11)
>                           0000000000265532: a7f4ff57            brc     
> 15,2653e0
>                           0000000000265536: e310b8280104        lg      
> %r1,6184(%r11)
>                           000000000026553c: a71b0001            aghi    %r1,1
>                           0000000000265540: e310b8280124        stg     
> %r1,6184(%r11)
> [   48.348099] Call Trace:
> [   48.348100] ([<000003d1002a91c0>] 0x3d1002a91c0)
> [   48.348102]  [<00000000002404aa>] page_remove_rmap+0xf2/0x16c
> [   48.348106]  [<0000000000232dc8>] unmap_single_vma+0x494/0x7d8
> [   48.348107]  [<0000000000233ac0>] unmap_vmas+0x50/0x74
> [   48.348109]  [<00000000002396ec>] unmap_region+0x9c/0x110
> [   48.348110]  [<000000000023bd18>] do_munmap+0x284/0x470
> [   48.348111]  [<000000000023bf56>] vm_munmap+0x52/0x70
> [   48.348113]  [<000000000023cf32>] SyS_munmap+0x3a/0x4c
> [   48.348114]  [<0000000000665e14>] sysc_noemu+0x22/0x28
> [   48.348118]  [<000003fffcf187b2>] 0x3fffcf187b2
> [   48.348119] Last Breaking-Event-Address:
> [   48.348120]  [<0000000000265528>] __mem_cgroup_uncharge_common+0x2c0/0x33c
> 
> Looking at the code, the code flow is:
> 
> page_remove_rmap() -> mem_cgroup_uncharge_page() -> 
> __mem_cgroup_uncharge_common()
> 
> Note that in mem_cgroup_uncharge_page() the page in question passed the check:
> 
> [...]
>         if (PageSwapCache(page))
>                 return;
> [...]
> 
> and just a couple of instructions later the VM_BUG_ON() within
> __mem_cgroup_uncharge_common() triggers:
> 
> [...]
>         if (mem_cgroup_disabled())
>                 return NULL;
> 
>         VM_BUG_ON(PageSwapCache(page));
> [...]
> 
> Which means that another cpu changed the pageflags concurrently. In fact,
> looking at the dump a different cpu is indeed busy with running kswapd.

Hmm, maybe I am missing something but it really looks like we can race
here. Reclaim path takes the page lock while zap_pte takes page table
lock so nothing prevents them from racing here:
shrink_page_list                zap_pte_range
  trylock_page                    pte_offset_map_lock
  add_to_swap                       page_remove_rmap
    /* Page can be still mapped */
    add_to_swap_cache                 atomic_add_negative(_mapcount)
      __add_to_swap_cache               mem_cgroup_uncharge_page
                                          (PageSwapCache(page)) && return
        SetPageSwapCache
                                          __mem_cgroup_uncharge_common
                                            VM_BUG_ON(PageSwapCache(page))

Maybe not many people run with CONFIG_DEBUG_VM enabled these days so we
do not this more often (even me testing configs are not consistent in
that regards and only few have it on). The only thing that changed in
this area recently is 0c59b89c which made the test VM_BUG_ON rather then
simple return in 3.6
And maybe the BUG_ON is too harsh as CgroupUsed should guarantee that
the uncharge will eventually go away. What do you think Johannes?
 
> So.. this seems to be somewhat broken. Anyone familiar with memcontrol?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to