On 1/16/21 2:15 AM, Qu Wenruo wrote:
In btrfs_release_extent_buffer_pages(), we need to add extra handling for subpage.To do so, introduce a new helper, detach_extent_buffer_page(), to do different handling for regular and subpage cases. For subpage case, the new trick is about when to detach the page private. For unammped (dummy or cloned) ebs, we can detach the page private immediately as the page can only be attached to one unmapped eb. For mapped ebs, we have to ensure there are no eb in the page range before we delete it, as page->private is shared between all ebs in the same page. But there is a subpage specific race, where we can race with extent buffer allocation, and clear the page private while new eb is still being utilized, like this: Extent buffer A is the new extent buffer which will be allocated, while extent buffer B is the last existing extent buffer of the page. T1 (eb A) | T2 (eb B) -------------------------------+------------------------------ alloc_extent_buffer() | btrfs_release_extent_buffer_pages() |- p = find_or_create_page() | | |- attach_extent_buffer_page() | | | | |- detach_extent_buffer_page() | | |- if (!page_range_has_eb()) | | | No new eb in the page range yet | | | As new eb A hasn't yet been | | | inserted into radix tree. | | |- btrfs_detach_subpage() | | |- detach_page_private(); |- radix_tree_insert() | Then we have a metadata eb whose page has no private bit. To avoid such race, we introduce a subpage metadata specific member, btrfs_subpage::under_alloc. In alloc_extent_buffer() we set that bit with the critical section of private_lock. So that page_range_has_eb() will return true for detach_extent_buffer_page(), and not to detach page private. New helpers are introduced to do the start/end work: - btrfs_page_start_meta_alloc() - btrfs_page_end_meta_alloc() Signed-off-by: Qu Wenruo <[email protected]>
This is overly complicated, why not just have subpage->refs or ->attached as eb's are attached to the subpage? Then we only detach the subpage on the last eb that's referencing that page. This is much more straightforward and avoids a whole other radix tree lookup on release. Thanks,
Josef
