On Fri, Oct 17, 2025 at 2:00 AM Lorenzo Stoakes <[email protected]> wrote: > > On Mon, Oct 13, 2025 at 11:28:16PM -0700, Hugh Dickins wrote: > > On Mon, 13 Oct 2025, Kalesh Singh wrote: > > > > > The VMA count limit check in do_mmap() and do_brk_flags() uses a > > > strict inequality (>), which allows a process's VMA count to exceed > > > the configured sysctl_max_map_count limit by one. > > > > > > A process with mm->map_count == sysctl_max_map_count will incorrectly > > > pass this check and then exceed the limit upon allocation of a new VMA > > > when its map_count is incremented. > > > > > > Other VMA allocation paths, such as split_vma(), already use the > > > correct, inclusive (>=) comparison. > > > > > > Fix this bug by changing the comparison to be inclusive in do_mmap() > > > and do_brk_flags(), bringing them in line with the correct behavior > > > of other allocation paths. > > > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > Cc: <[email protected]> > > > Cc: Andrew Morton <[email protected]> > > > Cc: David Hildenbrand <[email protected]> > > > Cc: "Liam R. Howlett" <[email protected]> > > > Cc: Lorenzo Stoakes <[email protected]> > > > Cc: Mike Rapoport <[email protected]> > > > Cc: Minchan Kim <[email protected]> > > > Cc: Pedro Falcato <[email protected]> > > > Reviewed-by: David Hildenbrand <[email protected]> > > > Reviewed-by: Lorenzo Stoakes <[email protected]> > > > Reviewed-by: Pedro Falcato <[email protected]> > > > Acked-by: SeongJae Park <[email protected]> > > > Signed-off-by: Kalesh Singh <[email protected]> > > > --- > > > > > > Changes in v3: > > > - Collect Reviewed-by and Acked-by tags. > > > > > > Changes in v2: > > > - Fix mmap check, per Pedro > > > > > > mm/mmap.c | 2 +- > > > mm/vma.c | 2 +- > > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 644f02071a41..da2cbdc0f87b 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -374,7 +374,7 @@ unsigned long do_mmap(struct file *file, unsigned > > > long addr, > > > return -EOVERFLOW; > > > > > > /* Too many mappings? */ > > > - if (mm->map_count > sysctl_max_map_count) > > > + if (mm->map_count >= sysctl_max_map_count) > > > return -ENOMEM; > > > > > > /* > > > diff --git a/mm/vma.c b/mm/vma.c > > > index a2e1ae954662..fba68f13e628 100644 > > > --- a/mm/vma.c > > > +++ b/mm/vma.c > > > @@ -2797,7 +2797,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct > > > vm_area_struct *vma, > > > if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) > > > return -ENOMEM; > > > > > > - if (mm->map_count > sysctl_max_map_count) > > > + if (mm->map_count >= sysctl_max_map_count) > > > return -ENOMEM; > > > > > > if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) > > > -- > > > 2.51.0.760.g7b8bcc2412-goog > > > > Sorry for letting you go so far before speaking up (I had to test what > > I believed to be true, and had hoped that meanwhile one of your many > > illustrious reviewers would say so first, but no): it's a NAK from me. > > > > These are not off-by-ones: at the point of these checks, it is not > > known whether an additional map/vma will have to be added, or the > > addition will be merged into an existing map/vma. So the checks > > err on the lenient side, letting you get perhaps one more than the > > sysctl said, but not allowing any more than that. > > > > Which is all that matters, isn't it? Limiting unrestrained growth. > > > > In this patch you're proposing to change it from erring on the > > lenient side to erring on the strict side - prohibiting merges > > at the limit which have been allowed for many years. > > > > Whatever one thinks about the merits of erring on the lenient versus > > erring on the strict side, I see no reason to make this change now, > > and most certainly not with a Fixes Cc: stable. There is no danger > > in the current behaviour; there is danger in prohibiting what was > > allowed before. > > Thanks for highlighting this, but this is something that people just 'had > to know'. If so many maintainers are unaware that this is a requirement, > this is a sign that this is very unclear. > > So as I said to Kalesh elsewhere, this is something we really do need to > comment very clearly. > > Or perhaps have as a helper function to _very explicitly_ show that we're > making this allowance. > > I do agree we should err on the side of caution, though if you're at a > point where you're _about_ to hit the map count limit you're already > screwed really. > > But for the sake of avoiding breaking people who are doing crazy things (or > perhaps I'm not imaginative enough here :) yes let's leave it as is. > > But I really _do not_ want to see this global exported so, I think an > appropriate helper function or use of the newly introduced one with a > comment are vital. > > > > > As to the remainder of your series: I have to commend you for doing > > a thorough and well-presented job, but I cannot myself see the point in > > changing 21 files for what almost amounts to a max_map_count subsystem. > > I call it misdirected effort, not at all to my taste, which prefers the > > straightforward checks already there; but accept that my taste may be > > out of fashion, so won't stand in the way if others think it worthwhile. > > I disagree very much, I see value here. > > Avoiding referencing an ugly global is a big win in itself, but > self-documenting code has huge value. > > In general mm has had a habit of hiding information as to how things work > for a long time (when writing the book I really had to decode a _lot_ of > this kind of thing). > > I think it's time we moved away from this, and tried to make the code as > clear as possible.
Hi all, Thanks, Lorenzo, for the feedback and support. The consensus from the discussion appears to be: - Drop this patch keeping the existing check lenient, but make it more obvious and clear that this is the intended behavior. - Keep the vma_count_remaining() or another helper and abstracts away the direct use of the global sysctl_max_map_count. - Keep the rename of map_count to vma_count - Keep the selftest and tracepoint patches, which are important for documenting the behavior and providing observability. If there are no objections, I'll plan to send out a new version of the series. Thanks, Kalesh > > > > > Hugh > > Cheers, Lorenzo
