On 2020/12/18 下午11:41, Josef Bacik wrote:
On 12/17/20 7:44 PM, Qu Wenruo wrote:


On 2020/12/18 上午12:00, Josef Bacik wrote:
On 12/10/20 1:38 AM, Qu Wenruo wrote:
For subpage case, we need to allocate new memory for each metadata page.

So we need to:
- Allow attach_extent_buffer_page() to return int
   To indicate allocation failure

- Prealloc page->private for alloc_extent_buffer()
   We don't want to call memory allocation with spinlock hold, so
   do preallocation before we acquire the spin lock.

- Handle subpage and regular case differently in
   attach_extent_buffer_page()
   For regular case, just do the usual thing.
   For subpage case, allocate new memory and update the tree_block
   bitmap.

   The bitmap update will be handled by new subpage specific helper,
   btrfs_subpage_set_tree_block().

Signed-off-by: Qu Wenruo <[email protected]>
---
  fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
  fs/btrfs/subpage.h   | 44 ++++++++++++++++++++++++++++
  2 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6350c2687c7e..51dd7ec3c2b3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
  #include "rcu-string.h"
  #include "backref.h"
  #include "disk-io.h"
+#include "subpage.h"
  static struct kmem_cache *extent_state_cache;
  static struct kmem_cache *extent_buffer_cache;
@@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
      return ret;
  }
-static void attach_extent_buffer_page(struct extent_buffer *eb,
+static int attach_extent_buffer_page(struct extent_buffer *eb,
                        struct page *page)
  {
-    /*
-     * If the page is mapped to btree inode, we should hold the private
-     * lock to prevent race.
-     * For cloned or dummy extent buffers, their pages are not mapped and
-     * will not race with any other ebs.
-     */
-    if (page->mapping)
-        lockdep_assert_held(&page->mapping->private_lock);
+    struct btrfs_fs_info *fs_info = eb->fs_info;
+    int ret;
-    if (!PagePrivate(page))
-        attach_page_private(page, eb);
-    else
-        WARN_ON(page->private != (unsigned long)eb);
+    if (fs_info->sectorsize == PAGE_SIZE) {
+        /*
+         * If the page is mapped to btree inode, we should hold the
+         * private lock to prevent race.
+         * For cloned or dummy extent buffers, their pages are not
+         * mapped and will not race with any other ebs.
+         */
+        if (page->mapping)
+            lockdep_assert_held(&page->mapping->private_lock);
+
+        if (!PagePrivate(page))
+            attach_page_private(page, eb);
+        else
+            WARN_ON(page->private != (unsigned long)eb);
+        return 0;
+    }
+
+    /* Already mapped, just update the existing range */
+    if (PagePrivate(page))
+        goto update_bitmap;
+
+    /* Do new allocation to attach subpage */
+    ret = btrfs_attach_subpage(fs_info, page);
+    if (ret < 0)
+        return ret;
+
+update_bitmap:
+    btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
+    return 0;
  }
  void set_page_extent_mapped(struct page *page)
@@ -5067,12 +5087,19 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
          return NULL;
      for (i = 0; i < num_pages; i++) {
+        int ret;
+
          p = alloc_page(GFP_NOFS);
          if (!p) {
              btrfs_release_extent_buffer(new);
              return NULL;
          }
-        attach_extent_buffer_page(new, p);
+        ret = attach_extent_buffer_page(new, p);
+        if (ret < 0) {
+            put_page(p);
+            btrfs_release_extent_buffer(new);
+            return NULL;
+        }
          WARN_ON(PageDirty(p));
          SetPageUptodate(p);
          new->pages[i] = p;
@@ -5321,6 +5348,18 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
              goto free_eb;
          }
+        /*
+         * Preallocate page->private for subpage case, so that
+         * we won't allocate memory with private_lock hold.
+         */
+        ret = btrfs_attach_subpage(fs_info, p);
+        if (ret < 0) {
+            unlock_page(p);
+            put_page(p);
+            exists = ERR_PTR(-ENOMEM);
+            goto free_eb;
+        }
+

This is broken, if we race with another thread adding an extent buffer for this same range we'll overwrite the page private with the new thing, losing any of the work that was done previously.  Thanks,

Firstly the page is locked, so there should be only one to grab the page.

Secondly, btrfs_attach_subpage() would just exit if it detects the page is already private.

So there shouldn't be a race.

Task1                        Task2
alloc_extent_buffer(4096)            alloc_extent_buffer(4096)
   find_extent_buffer, nothing              find_extent_buffer, nothing
     find_or_create_page(1)
                             find_or_create_page(1)
                               waits on page lock
       btrfs_attach_subpage()
   radix_tree_insert()
   unlock pages
                               exit find_or_create_page()
                             btrfs_attach_subpage(), BAD

there's definitely a race, again this is why the code does the check to see if there's a private attached to the EB already.  Thanks,

btrfs_attach_subpage() is already doing the private check.

Thanks,
Qu


Josef

Reply via email to