On Mon, May 18, 2026 at 02:20:30PM +0200, Christian Brauner wrote: > On Sat, May 16, 2026 at 07:21:26PM +0100, Pedro Falcato wrote: > > Since the advent of vulns like Dirty Pipe, Dirty Frag, Copy Fail > > and Fragnasia, splicing a read-only file is fundamentally unsafe. > > > > As such, as a mitigation, add a way for users to block splice() for > > files they cannot write to. This eliminates this whole class of exploits > > that use splice()+confusion in pipe/net/etc code to gain write-access to > > files they can only read. > > > > Users can simply toggle fs.splice_needs_write=1 and suddenly splice() will > > refuse perfectly legal splices() from files it can only read, but not write. > > > > For vmsplice(), make due with the address_space attached to the folio. Care > > is held to make sure the operation isn't too slowed down with locks. The > > check > > itself isn't entirely equivalent (the mapping's host can be the internal > > bdev > > inode, etc, and not the one in /dev against which permissions are checked), > > but doing it in a more correct way would require dropping from GUP-fast to > > GUP, and that would be too slow. > > > > Signed-off-by: Pedro Falcato <[email protected]> > > --- > > > > Hello, > > > > sending this out as an RFC so I can get better opinions from VFS & security > > folks upstream. I wrote this out as a way to harden against all the page > > cache attacks we've seen lately, that bottom out to splice() from a file > > they cannot write + confusion elsewhere on the net stack/pipes/etc. > > > > This is _obviously_ not perfect and not complete. My first (unsent) version > > straight up returned -EPERM on splice() for these files. This one attempts > > to retain some compatibility by only blocking the page splicing operation, > > but still issuing the operation with normal copies (kindly suggested by > > Jan). > > vmsplice() is a complicated issue, because gup_fast does not allow us access > > to the VMA's vm_file. I tried hacking around it but it's not perfect (e.g > > you > > cannot grab the mnt_idmap for the file, since we only have access to the > > address_space + its host). > > I'm also not a fan of having somewhat hairy MM code in the middle of > > fs/splice.c but that's something we can simply hoist elsewhere as this gets > > un-RFC'd. It's also missing the external-facing docs for the sysctl. > > > > My big questions are: > > 1) Is this a viable way forward? > > I think that splice and vmsplice() are pretty wonky apis. Ignoring it's > recent prominent role in page cache attacks it suffers from weird issues > due to its interactions with pipe_lock(). > > Bug with splice to a pipe preventing a process exit > [email protected] > Sendfile holding pipe->mutex blocks the peer's pipe_release() from do_exit(). > > Change in splice() behaviour after 5.10? (LTP splice07) > [email protected] > > [PATCH v2 00/11] Avoid unprivileged splice(file->)/(->socket) pipe exclusion > [email protected] > Pending splice from tty/socket/FIFO holds pipe->mutex indefinitely, blocking > all other FIFO ops incl. read(O_NONBLOCK) > > splice: prevent deadlock when splicing a file to itself > [email protected] > do_splice_direct_actor() still lacks file_inode(in) == file_inode(out) guard > > AF_UNIX/zerocopy/pipe/vmsplice/splice vs FOLL_PIN > [email protected] > vmsplice/splice into AF_UNIX/pipe doesn't FOLL_PIN the source memory > > My main gripe with the patch as written is that I find it really hard to > figure out who would deploy this. It half-cripples splice() and > vmsplice() for some use-cases but leaves it intact for others.
Not just splice() and vmsplice(), but sendfile(), copy_file_range() too. My bet (perhaps not informed enough) is that there simply aren't that many users doing splice-like opeartions from files they do not own in some way. (maybe not true for copy_file_range(), I admit) > > At that point you can also just ENOSYS splice() and vmsplice() via > seccomp and force a fallback on non-splice codepaths that userspace has > to have anyway as splice() isn't supported unconditionally. IIRC GNU grep is one simple example where they assume splice() from a pipe to /dev/null Just Works(tm) and it exits(1) otherwise. > It feels like a knee-jerk reaction to an exploit class originating in > buggy modules that we have little control over and we would extend an > API to users that is really difficult to use. > > What might make more sense is to add a splice specific security_*() hook > into the code so that an LSM can deny usage of splice in whatever way it > wants to - bpf lsm or in-tree lsm. I don't dislike that option, but I don't love leaving hardening to LSMs. The kernel quite literally gets a new splice-related vulnerability every week now, where userspace gets to pass pages it has no business passing to funky codepaths that then write on these pages. I feel like natively restricting what you can pass is simply a natural way forward. > > Then we don't have to have all this gunk in the VFS layer that will be > annoying to maintain with little value in the long-term. So I'm not very > likely to pick this up as is. Totally. That's what the RFC tag is for :) -- Pedro

