On Tue, 5 Aug 2008, Balbir Singh wrote: > Hugh Dickins wrote: > [snip] > > > > BUG: unable to handle kernel paging request at 6b6b6b8b > > IP: [<7817078f>] memrlimit_cgroup_uncharge_as+0x18/0x29 > > Pid: 22500, comm: swapoff Not tainted (2.6.26-rc8-mm1 #7) > > [<78161323>] ? exit_mmap+0xaf/0x133 > > [<781226b1>] ? mmput+0x4c/0xba > > [<78165ce3>] ? try_to_unuse+0x20b/0x3f5 > > [<78371534>] ? _spin_unlock+0x22/0x3c > > [<7816636a>] ? sys_swapoff+0x17b/0x37c > > [<78102d95>] ? sysenter_past_esp+0x6a/0xa5 > > I am unable to reproduce the problem,
Me neither, I've spent many hours trying 2.6.27-rc1-mm1 and then back to 2.6.26-rc8-mm1. But I've been SO stupid: saw it originally on one machine with SLAB_DEBUG=y, have been trying since mostly on another with SLUB_DEBUG=y, but never thought to boot with slub_debug=P,task_struct until now. > but I do have an initial hypothesis > > CPU0 CPU1 > try_to_unuse > task 1 stars exiting look at mm = task1->mm > .. increment mm_users > task 1 exits > mm->owner needs to be updated, but > no new owner is found > (mm_users > 1, but no other task > has task->mm = task1->mm) > mm_update_next_owner() leaves > > grace period > user count drops, call mmput(mm) > task 1 freed > dereferencing mm->owner fails Yes, that looks right to me: seems obvious now. I don't think your careful alternation of CPU0/1 events at the end matters: the swapoff CPU simply dereferences mm->owner after that task has gone. (That's a shame, I'd always hoped that mm->owner->comm was going to be good for use in mm messages, even when tearing down the mm.) > I do have a potential solution in mind, but I want to make sure my > hypothesis is correct. It seems wrong that memrlimit_cgroup_uncharge_as should be called after mm->owner may have been changed, even if it's to something safe. But I forget the mm/task exit details, surely they're tricky. By the way, is the ordering in mm_update_next_owner the best? Would there be less movement if it searched amongst siblings before it searched amongst children? Ought it to make a first pass trying to stay within the same cgroup? Hugh _______________________________________________ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers _______________________________________________ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel