This check ensures the correctness of collapse read-only THPs for FSes after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting PMD THP pagecache.
READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps and inode->i_writecount to prevent any write to read-only to-be-collapsed folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the aforementioned mechanism will go away too. To ensure khugepaged functions as expected after the changes, skip if any folio is dirty after try_to_unmap(), since a dirty folio means this read-only folio got some writes via mmap can happen between try_to_unmap() and try_to_unmap_flush() via cached TLB entries and khugepaged does not support writable pagecache folio collapse yet. Signed-off-by: Zi Yan <[email protected]> --- mm/khugepaged.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3eb5d982d3d3..1c0fdc81d276 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, } } else if (folio_test_dirty(folio)) { /* - * khugepaged only works on read-only fd, - * so this page is dirty because it hasn't + * This page is dirty because it hasn't * been flushed since first write. There * won't be new dirty pages. * @@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, if (!is_shmem && (folio_test_dirty(folio) || folio_test_writeback(folio))) { /* - * khugepaged only works on read-only fd, so this - * folio is dirty because it hasn't been flushed + * khugepaged only works on clean file-backed folios, + * so this folio is dirty because it hasn't been flushed * since first write. */ result = SCAN_PAGE_DIRTY_OR_WRITEBACK; @@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, goto out_unlock; } + /* + * At this point, the folio is locked, unmapped. Make sure the + * folio is clean, so that no one else is able to write to it, + * since that would require taking the folio lock first. + * Otherwise that means the folio was pointed by a dirty PTE and + * some CPU might have a valid TLB entry with dirty bit set + * still pointing to this folio and writes can happen without + * causing a page table walk and folio lock acquisition before + * the try_to_unmap_flush() below is done. After the collapse, + * file-backed folio is not set as dirty and can be discarded + * before any new write marks the folio dirty, causing data + * corruption. + */ + if (!is_shmem && folio_test_dirty(folio)) { + result = SCAN_PAGE_DIRTY_OR_WRITEBACK; + goto out_unlock; + } + /* * Accumulate the folios that are being collapsed. */ -- 2.43.0

