On Sun, May 17, 2026 at 12:07:05AM +0100, Matthew Wilcox wrote:
> On Sat, May 16, 2026 at 07:21:26PM +0100, Pedro Falcato wrote:
> > +static bool may_write_to_page(struct page *page, struct address_space
> > **plast)
> > +{
> > + struct folio *folio = page_folio(page);
> > + struct address_space *mapping, *last = *plast;
> > + struct inode *inode;
> > + bool may = false;
> > +
> > + if (!READ_ONCE(sysctl_splice_needs_write))
> > + return true;
> > + /*
> > + * Always fine to write to anon folios.
> > + */
> > + if (folio_test_anon(folio))
> > + return true;
>
> What about KSM? It's not something we've seen attacked yet, but it'd be
> pretty nasty to be able to change a KSM page in another process.
It's my understanding that only anon pages can be KSM'd, and KSM still keeps
the FOLIO_MAPPING_ANON bit set. So folio_test_anon() should still test true
for those.
>
> I just got off a flight, so hopefully I'm semicoherent.
>
> > + mapping = READ_ONCE(folio->mapping);
> > + WARN_ON((unsigned long) mapping & FOLIO_MAPPING_FLAGS);
> > +
> > + /* If it is the same (locklessly), then LGTM, proceed. */
> > + if (mapping == last)
> > + return true;
> > + /*
> > + * Else we have to recheck with the folio lock held, for mapping
> > + * stability. TODO: killable?
>
> I wouldn't've thought that'd be necessary. The folio can't be being
> read because it's mapped, and we won't map a folio until it's uptodate.
Makes sense, I'll avoid the trouble then.
>
> > + */
> > + folio_lock(folio);
> > + mapping = folio_mapping(folio);
>
> I think you're safe to just look at folio->mapping here. You have a
> refcount on the folio so it can't be freed, and I'm not sure there's a
> way to transition from page cache folio to anon folio without taking a
> trip through the page allocator.
Yep, makes sense. I don't think there is either. The worst that can happen is
that the folio could be truncated out while we have a reference but not the
lock.
I think I just used the helper for the sake of using the helper, so I'll replace
it with ->mapping.
>
> > + /* May have been truncated, etc */
> > + if (!mapping)
> > + goto out_lock;
>
> typically we call this "out_unlock".
ACK
>
> > + inode = mapping->host;
> > + may = inode_owner_or_capable(&nop_mnt_idmap, inode) ||
> > + inode_permission(&nop_mnt_idmap, inode, MAY_WRITE) == 0;
> > + if (likely(may))
> > + *plast = mapping;
> > +out_lock:
> > + folio_unlock(folio);
> > + return may;
> > +}
>
> I don't have a problem with the idea, other than it's really sad we have
> to do this.
Indeed :/
--
Pedro