On 29 Apr 2026, at 11:29, Zi Yan wrote: > This check ensures the correctness of read-only PMD folio collapse > after it is enabled for all FSes supporting PMD pagecache folios and > replaces READ_ONLY_THP_FOR_FS. > > READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps > and inode->i_writecount to prevent any write to read-only to-be-collapsed > folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the > aforementioned mechanism will go away too. To ensure khugepaged functions > as expected after the changes, skip if any folio is dirty after > try_to_unmap(), since a dirty folio at that point means this read-only > folio can get writes between try_to_unmap() and try_to_unmap_flush() via > cached TLB entries and khugepaged does not support writable pagecache folio > collapse yet. > > Signed-off-by: Zi Yan <[email protected]> > Reviewed-by: Baolin Wang <[email protected]> > Acked-by: David Hildenbrand (Arm) <[email protected]> > --- > mm/khugepaged.c | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 6808f2b48d864..71209a72195ab 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct > *mm, unsigned long addr, > } > } else if (folio_test_dirty(folio)) { > /* > - * khugepaged only works on read-only fd, > - * so this page is dirty because it hasn't > + * This page is dirty because it hasn't > * been flushed since first write. There > * won't be new dirty pages. > * > @@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct > *mm, unsigned long addr, > if (!is_shmem && (folio_test_dirty(folio) || > folio_test_writeback(folio))) { > /* > - * khugepaged only works on read-only fd, so this > - * folio is dirty because it hasn't been flushed > + * khugepaged only works on clean file-backed folios, > + * so this folio is dirty because it hasn't been flushed > * since first write. > */ > result = SCAN_PAGE_DIRTY_OR_WRITEBACK; > @@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct > *mm, unsigned long addr, > goto out_unlock; > } > > + /* > + * At this point, the folio is locked and unmapped. If the PTE > + * was dirty, try_to_unmap() has transferred the dirty bit to > + * the folio and we must not collapse it into a clean > + * file-backed folio. > + * > + * If the folio is clean here, no one can write it until we > + * drop the folio lock. A write through a stale TLB entry came > + * from a clean PTE and must fault because the PTE has been > + * cleared; the fault path has to take the folio lock before > + * installing a writable mapping. Buffered write paths also > + * have to take the folio lock before modifying file contents > + * without a mapping, typically via write_begin_get_folio(). > + */ > + if (!is_shmem && folio_test_dirty(folio)) { > + result = SCAN_PAGE_DIRTY_OR_WRITEBACK; > + xas_unlock_irq(&xas); > + folio_putback_lru(folio); > + goto out_unlock;
Sashiko asked: Could a concurrent operation, such as truncate(), lock the folio, remove it from the page cache, and drop the final reference while we are jumping to xa_unlocked? If the page is freed back to the buddy allocator before try_to_unmap_flush() completes, could this leave a stale TLB entry pointing to the freed page, potentially allowing memory corruption if the page is reallocated? Answer: The folio still has pagecache and LRU refs before try_to_unmap_flush() and the truncate and free operation cannot be completed in that small window. > + } > + > /* > * Accumulate the folios that are being collapsed. > */ > -- > 2.53.0 Best Regards, Yan, Zi

