On 4/29/26 9:29 AM, Zi Yan wrote:
This check ensures the correctness of read-only PMD folio collapse after it is enabled for all FSes supporting PMD pagecache folios and replaces READ_ONLY_THP_FOR_FS. READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps and inode->i_writecount to prevent any write to read-only to-be-collapsed folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the aforementioned mechanism will go away too. To ensure khugepaged functions as expected after the changes, skip if any folio is dirty after try_to_unmap(), since a dirty folio at that point means this read-only folio can get writes between try_to_unmap() and try_to_unmap_flush() via cached TLB entries and khugepaged does not support writable pagecache folio collapse yet. Signed-off-by: Zi Yan <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Acked-by: David Hildenbrand (Arm) <[email protected]>
LGTM Reviewed-by: Nico Pache <[email protected]>
--- mm/khugepaged.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6808f2b48d864..71209a72195ab 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, } } else if (folio_test_dirty(folio)) { /* - * khugepaged only works on read-only fd, - * so this page is dirty because it hasn't + * This page is dirty because it hasn't * been flushed since first write. There * won't be new dirty pages. * @@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, if (!is_shmem && (folio_test_dirty(folio) || folio_test_writeback(folio))) { /* - * khugepaged only works on read-only fd, so this - * folio is dirty because it hasn't been flushed + * khugepaged only works on clean file-backed folios, + * so this folio is dirty because it hasn't been flushed * since first write. */ result = SCAN_PAGE_DIRTY_OR_WRITEBACK; @@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, goto out_unlock; }+ /*+ * At this point, the folio is locked and unmapped. If the PTE + * was dirty, try_to_unmap() has transferred the dirty bit to + * the folio and we must not collapse it into a clean + * file-backed folio. + * + * If the folio is clean here, no one can write it until we + * drop the folio lock. A write through a stale TLB entry came + * from a clean PTE and must fault because the PTE has been + * cleared; the fault path has to take the folio lock before + * installing a writable mapping. Buffered write paths also + * have to take the folio lock before modifying file contents + * without a mapping, typically via write_begin_get_folio(). + */ + if (!is_shmem && folio_test_dirty(folio)) { + result = SCAN_PAGE_DIRTY_OR_WRITEBACK; + xas_unlock_irq(&xas); + folio_putback_lru(folio); + goto out_unlock; + } + /* * Accumulate the folios that are being collapsed. */

