On Mon, 16 Mar 2026 09:45:49 +0100 Thomas Zimmermann <[email protected]> wrote:
> Hi Boris, > > thanks for investigating this problem. > > Am 13.03.26 um 18:45 schrieb Boris Brezillon: > > On Fri, 13 Mar 2026 13:55:21 +0100 > > Boris Brezillon <[email protected]> wrote: > > > >> On Fri, 13 Mar 2026 13:43:28 +0100 > >> Boris Brezillon <[email protected]> wrote: > >> > >>> On Fri, 13 Mar 2026 13:18:35 +0100 > >>> Boris Brezillon <[email protected]> wrote: > >>> > >>>> On Fri, 13 Mar 2026 12:04:25 +0000 > >>>> Biju Das <[email protected]> wrote: > >>>> > >>>>>> -----Original Message----- > >>>>>> From: dri-devel <[email protected]> On Behalf Of > >>>>>> Boris Brezillon > >>>>>> Sent: 13 March 2026 11:57 > >>>>>> Subject: Re: [PATCH v4 5/6] drm/gem-shmem: Track folio accessed/dirty > >>>>>> status in mmap > >>>>>> > >>>>>> On Fri, 13 Mar 2026 11:29:47 +0100 > >>>>>> Thomas Zimmermann <[email protected]> wrote: > >>>>>> > >>>>>>> Hi > >>>>>>> > >>>>>>> Am 13.03.26 um 11:18 schrieb Boris Brezillon: > >>>>>>> [...] > >>>>>>>>>>>> + if (drm_WARN_ON(obj->dev, !shmem->pages || page_offset > >>>>>>>>>>>> >= num_pages)) > >>>>>>>>>>>> + return VM_FAULT_SIGBUS; > >>>>>>>>>>>> + > >>>>>>>>>>>> + file_update_time(vma->vm_file); > >>>>>>>>>>>> + > >>>>>>>>>>>> + > >>>>>>>>>>>> folio_mark_dirty(page_folio(shmem->pages[page_offset])); > >>>>>>>> Do we need a folio_mark_dirty_lock() here? > >>>>>>> There is a helper for that with some documentation. [1] > >>>>>> This [1] seems to solve the problem for me. Still unsure about the > >>>>>> folio_mark_dirty_lock vs > >>>>>> folio_mark_dirty though. > >>>>>> > >>>>>> [1]https://yhbt.net/lore/dri-devel/[email protected]/ > >>>>>> > >>>>> FYI, I used folio_mark_dirty_lock() still it does not solve the issue > >>>>> with weston hang. > >>>> The patch I pointed to has nothing to do with folio_mark_dirty_lock(), > >>>> It's a bug caused by huge page mapping changes. > >>> Scratch that. I had a bunch of other changes on top, and it hangs again > >>> now that I dropped those. > >> Seems like it's the combination of huge pages and mkwrite that's > >> causing issues, if I disable huge pages, it doesn't hang... > > I managed to have it working with the following diff. I still need to > > check why the "map-RO-split+RW-on-demand" approach doesn't work (races > > between huge_fault and pfn_mkwrite?), but I think it's okay to map the > > real thing writable on the first attempt anyway (we're not trying to do > > CoW here, since we're always pointing to the same page, it's just the > > permissions that change). Note that there's still the race fixed by > > https://yhbt.net/lore/dri-devel/[email protected]/ > > in this diff, I just tried to keep the diffstat minimal. ^ "that's not present in this diff" sorry for the confusion. Aside from that, I've been looking more closely at the code in mm/memory.c, and other implementations of .pfn_mkwrite(), and I'm still not confident that: - we can call folio_mark_dirty() without the folio lock held in that path unless we have the GEM resv lock held (the pte lock is released, and I'm not sure there's anything else holding on the folio). - we can claim that the huge vs normal-page paths are race-free. That's probably okay as long as we only do the dirty bookkeeping in pfn_mkwrite (we might flag the folio dirty before we know the writeable mapping has been setup propertly, but that's probably okay). What worries me a bit is the fact most implementations call their fault handler from pfn_mkwrite() and do the page table update from there. There's also this comment [1] that makes me doubt we're doing the right thing here. Would be good if someone from MM could chime in and shed some light on what's supposed to happen in pfn_mkwrite (Matthew, perhaps?). [1]https://elixir.bootlin.com/linux/v6.17.2/source/fs/xfs/xfs_file.c#L1899
