On 29 Apr 2026, at 11:29, Zi Yan wrote:

> This check ensures the correctness of read-only PMD folio collapse
> after it is enabled for all FSes supporting PMD pagecache folios and
> replaces READ_ONLY_THP_FOR_FS.
>
> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
> and inode->i_writecount to prevent any write to read-only to-be-collapsed
> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
> aforementioned mechanism will go away too. To ensure khugepaged functions
> as expected after the changes, skip if any folio is dirty after
> try_to_unmap(), since a dirty folio at that point means this read-only
> folio can get writes between try_to_unmap() and try_to_unmap_flush() via
> cached TLB entries and khugepaged does not support writable pagecache folio
> collapse yet.
>
> Signed-off-by: Zi Yan <[email protected]>
> Reviewed-by: Baolin Wang <[email protected]>
> Acked-by: David Hildenbrand (Arm) <[email protected]>
> ---
>  mm/khugepaged.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6808f2b48d864..71209a72195ab 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct 
> *mm, unsigned long addr,
>                               }
>                       } else if (folio_test_dirty(folio)) {
>                               /*
> -                              * khugepaged only works on read-only fd,
> -                              * so this page is dirty because it hasn't
> +                              * This page is dirty because it hasn't
>                                * been flushed since first write. There
>                                * won't be new dirty pages.
>                                *
> @@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct 
> *mm, unsigned long addr,
>               if (!is_shmem && (folio_test_dirty(folio) ||
>                                 folio_test_writeback(folio))) {
>                       /*
> -                      * khugepaged only works on read-only fd, so this
> -                      * folio is dirty because it hasn't been flushed
> +                      * khugepaged only works on clean file-backed folios,
> +                      * so this folio is dirty because it hasn't been flushed
>                        * since first write.
>                        */
>                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> @@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct 
> *mm, unsigned long addr,
>                       goto out_unlock;
>               }
>
> +             /*
> +              * At this point, the folio is locked and unmapped. If the PTE
> +              * was dirty, try_to_unmap() has transferred the dirty bit to
> +              * the folio and we must not collapse it into a clean
> +              * file-backed folio.
> +              *
> +              * If the folio is clean here, no one can write it until we
> +              * drop the folio lock. A write through a stale TLB entry came
> +              * from a clean PTE and must fault because the PTE has been
> +              * cleared; the fault path has to take the folio lock before
> +              * installing a writable mapping. Buffered write paths also
> +              * have to take the folio lock before modifying file contents
> +              * without a mapping, typically via write_begin_get_folio().
> +              */
> +             if (!is_shmem && folio_test_dirty(folio)) {
> +                     result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> +                     xas_unlock_irq(&xas);
> +                     folio_putback_lru(folio);
> +                     goto out_unlock;

Sashiko asked:

Could a concurrent operation, such as truncate(), lock the folio, remove it
from the page cache, and drop the final reference while we are jumping to
xa_unlocked?
If the page is freed back to the buddy allocator before try_to_unmap_flush()
completes, could this leave a stale TLB entry pointing to the freed page,
potentially allowing memory corruption if the page is reallocated?

Answer:

The folio still has pagecache and LRU refs before try_to_unmap_flush() and
the truncate and free operation cannot be completed in that small window.

> +             }
> +
>               /*
>                * Accumulate the folios that are being collapsed.
>                */
> -- 
> 2.53.0


Best Regards,
Yan, Zi

Reply via email to