On Thu, Sep 17, 2020 at 11:44:54AM +1000, Dave Chinner wrote:
> So....
> 
> P0                                    p1
> 
> hole punch starts
>   takes XFS_MMAPLOCK_EXCL
>   truncate_pagecache_range()

... locks page ...

>     unmap_mapping_range(start, end)
>       <clears ptes>
>                                       <read fault>
>                                       do_fault_around()
>                                         ->map_pages
>                                           filemap_map_pages()

... trylock page fails ...

>                                             page mapping valid,
>                                             page is up to date
>                                             maps PTEs
>                                       <fault done>
>     truncate_inode_pages_range()
>       truncate_cleanup_page(page)
>         invalidates page
>       delete_from_page_cache_batch(page)
>         frees page
>                                       <pte now points to a freed page>
> 
> That doesn't seem good to me.
> 
> Sure, maybe the page hasn't been freed back to the free lists
> because of elevated refcounts. But it's been released by the
> filesystem and not longer in the page cache so nothing good can come
> of this situation...
> 
> AFAICT, this race condition exists for the truncate case as well
> as filemap_map_pages() doesn't have a "page beyond inode size" check
> in it. However, exposure here is very limited in the truncate case
> because truncate_setsize()->truncate_pagecache() zaps the PTEs
> again after invalidating the page cache.
> 
> Either way, adding the XFS_MMAPLOCK_SHARED around
> filemap_map_pages() avoids the race condition for fallocate and
> truncate operations for XFS...
> 
> > As such it is a rather
> > different beast compared to the fault handler from fs POV and does not need
> > protection from hole punching (current serialization on page lock and
> > checking of page->mapping is enough).
> > That being said I agree this is subtle and the moment someone adds e.g. a
> > readahead call into filemap_map_pages() we have a real problem. I'm not
> > sure how to prevent this risk...
> 
> Subtle, yes. So subtle, in fact, I fail to see any reason why the
> above race cannot occur as there's no obvious serialisation in the
> page cache between PTE zapping and page invalidation to prevent a
> fault from coming in an re-establishing the PTEs before invalidation
> occurs.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
> 

Reply via email to