On 4/29/26 9:29 AM, Zi Yan wrote:
This check ensures the correctness of read-only PMD folio collapse
after it is enabled for all FSes supporting PMD pagecache folios and
replaces READ_ONLY_THP_FOR_FS.

READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, skip if any folio is dirty after
try_to_unmap(), since a dirty folio at that point means this read-only
folio can get writes between try_to_unmap() and try_to_unmap_flush() via
cached TLB entries and khugepaged does not support writable pagecache folio
collapse yet.

Signed-off-by: Zi Yan <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Acked-by: David Hildenbrand (Arm) <[email protected]>

LGTM

Reviewed-by: Nico Pache <[email protected]>

---
  mm/khugepaged.c | 28 ++++++++++++++++++++++++----
  1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6808f2b48d864..71209a72195ab 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct 
*mm, unsigned long addr,
                                }
                        } else if (folio_test_dirty(folio)) {
                                /*
-                                * khugepaged only works on read-only fd,
-                                * so this page is dirty because it hasn't
+                                * This page is dirty because it hasn't
                                 * been flushed since first write. There
                                 * won't be new dirty pages.
                                 *
@@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct 
*mm, unsigned long addr,
                if (!is_shmem && (folio_test_dirty(folio) ||
                                  folio_test_writeback(folio))) {
                        /*
-                        * khugepaged only works on read-only fd, so this
-                        * folio is dirty because it hasn't been flushed
+                        * khugepaged only works on clean file-backed folios,
+                        * so this folio is dirty because it hasn't been flushed
                         * since first write.
                         */
                        result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
@@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct 
*mm, unsigned long addr,
                        goto out_unlock;
                }
+ /*
+                * At this point, the folio is locked and unmapped. If the PTE
+                * was dirty, try_to_unmap() has transferred the dirty bit to
+                * the folio and we must not collapse it into a clean
+                * file-backed folio.
+                *
+                * If the folio is clean here, no one can write it until we
+                * drop the folio lock. A write through a stale TLB entry came
+                * from a clean PTE and must fault because the PTE has been
+                * cleared; the fault path has to take the folio lock before
+                * installing a writable mapping. Buffered write paths also
+                * have to take the folio lock before modifying file contents
+                * without a mapping, typically via write_begin_get_folio().
+                */
+               if (!is_shmem && folio_test_dirty(folio)) {
+                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+                       xas_unlock_irq(&xas);
+                       folio_putback_lru(folio);
+                       goto out_unlock;
+               }
+
                /*
                 * Accumulate the folios that are being collapsed.
                 */


Reply via email to