On Tue, Mar 03, 2026 at 10:25:28AM -0500, Steven Rostedt wrote: > On Tue, 3 Mar 2026 10:19:52 +0000 > Lorenzo Stoakes <[email protected]> wrote: > > > > > Setting VM_IO just to trigger a failure case in madvise() feels like a > > > > hack? I > > > > guess it'd do the trick though, but you're not going to be able to > > > > reclaim that > > > > memory, and you might get some unexpected behaviour in code paths that > > > > assume > > > > VM_IO means it's memory-mapped I/O... (for instance GUP will stop > > > > working, if > > > > you need that). > > > > > > Well, we don't reclaim that memory anyway. > > > > OK so maybe not such an issue then! As long as GUP not working with it > > wouldn't > > break anything? > > Yeah, we don't ever use get_user_page() for this memory. It's pretty much > all handled on the kernel side. User space just has it mapped as read only. > The kernel doesn't care what user space does with it.
OK. > > > > > Yeah, right now the accounting gets screwed up as the mappings get out of > > > sync when it is forked. > > > > Ack, is that in a way that could screw up things from a kernel point of > > view at > > all? Presumably there was some report that triggered this work, like an > > assert > > fail or something, or a warning? > > Yes, it triggered a warning. The start of this thread added a patch to allow > madvise to remove DONTCOPY from this memory without screwing things up > (by adding an open() handler for the vm_operations_struct). I mean that'd be preferable. > > > > > > If a user is explicitly going to an ftrace buffer and madvise()'ing it in > > random > > ways they've only themselves to blame for being so stupid :) > > > > As long as it's not exploitable in some way I don't think that's too much > > of an > > issue? > > > > It'd be nice to keep the semantics of 'don't copy on fork' if we could, > > even if > > some crazy users might override it with madvise(). > > OK, so should we add the VM_IO flag? Would be preferable not to lie about the mapping if possible :P that'd be a hack. > > > > > Kind of separate from the question, but I mean if it's kernel-allocated > > memory > > which you're managing separately it should surely be VM_PFNMAP? > > OK, I'm a bit confused. What do you mean "managing separately"? You mean > managing the user space side of things? if so then yes. > > Hmm, I haven't heard of VM_PFNMAP, can you explain more its use case. means either no struct page (e.g. could be I/O mapped device memory) or 'don't touch struct page it's not for userspace' e.g. kernel allocation. > > > > > It depends if you want to have a refcounted folio underneath though. If you > > do > > then yeah don't do that :) > > I have no idea what a refcounted folio would do here :-p Well you are currently treating this as a userland folio. > > > > > In general I'd suggest supporting the fork case if you can, or just let > > things > > be wrong if a user does something crazy and undoes the VM_DONTCOPY flag. > > OK, so don't add these flags and just allow the forking to happen as this > patch does (if they screw it up, it's their problem). > > This patch is here: > > https://lore.kernel.org/all/[email protected]/ > > I mean, we allow two separate tasks to mmap the same buffer, and they will > have the same issues as a fork would have. Thus, I guess the answer is to > apply this patch? Well note that vm_ops->open is also called on splitting a VMA (ok I guess you disable this I think you said) and also mremap()'ing (before it removes the old VMA, if MREMAP_DONTUNMAP is not set). Probably all that's fine right? If so then good, and yeah something not-VM_IO would be best I think! > > -- Steve Cheers, Lorenzo
