On 2020/12/10 下午11:30, Nikolay Borisov wrote:
On 10.12.20 г. 8:38 ч., Qu Wenruo wrote:
For subpage case, we need to allocate new memory for each metadata page.
So we need to:
- Allow attach_extent_buffer_page() to return int
To indicate allocation failure
- Prealloc page->private for alloc_extent_buffer()
We don't want to call memory allocation with spinlock hold, so
do preallocation before we acquire the spin lock.
- Handle subpage and regular case differently in
attach_extent_buffer_page()
For regular case, just do the usual thing.
For subpage case, allocate new memory and update the tree_block
bitmap.
The bitmap update will be handled by new subpage specific helper,
btrfs_subpage_set_tree_block().
Signed-off-by: Qu Wenruo <[email protected]>
---
fs/btrfs/extent_io.c | 69 +++++++++++++++++++++++++++++++++++---------
fs/btrfs/subpage.h | 44 ++++++++++++++++++++++++++++
2 files changed, 99 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6350c2687c7e..51dd7ec3c2b3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -24,6 +24,7 @@
#include "rcu-string.h"
#include "backref.h"
#include "disk-io.h"
+#include "subpage.h"
static struct kmem_cache *extent_state_cache;
static struct kmem_cache *extent_buffer_cache;
@@ -3142,22 +3143,41 @@ static int submit_extent_page(unsigned int opf,
return ret;
}
-static void attach_extent_buffer_page(struct extent_buffer *eb,
+static int attach_extent_buffer_page(struct extent_buffer *eb,
struct page *page)
{
- /*
- * If the page is mapped to btree inode, we should hold the private
- * lock to prevent race.
- * For cloned or dummy extent buffers, their pages are not mapped and
- * will not race with any other ebs.
- */
- if (page->mapping)
- lockdep_assert_held(&page->mapping->private_lock);
+ struct btrfs_fs_info *fs_info = eb->fs_info;
+ int ret;
- if (!PagePrivate(page))
- attach_page_private(page, eb);
- else
- WARN_ON(page->private != (unsigned long)eb);
+ if (fs_info->sectorsize == PAGE_SIZE) {
+ /*
+ * If the page is mapped to btree inode, we should hold the
+ * private lock to prevent race.
+ * For cloned or dummy extent buffers, their pages are not
+ * mapped and will not race with any other ebs.
+ */
+ if (page->mapping)
+ lockdep_assert_held(&page->mapping->private_lock);
+
+ if (!PagePrivate(page))
+ attach_page_private(page, eb);
+ else
+ WARN_ON(page->private != (unsigned long)eb);
+ return 0;
+ }
+
+ /* Already mapped, just update the existing range */
+ if (PagePrivate(page))
+ goto update_bitmap;
How can this check ever be false, given btrfs_attach_subpage is called
unconditionally in alloc_extent_buffer so that you can avoid allocating
memory with private lock held, yet in this function you check if memory
hasn't been allocated and you proceed to do it? Also that memory
allocation is done with GFP_NOFS under a spinlock, that's not atomic i.e
IO can still be kicked which means you can go to sleep while holding a
spinlock, not cool.
There are two callers of attach_extent_buffer_page(), one in
alloc_extent_buffer(), which we pre-allocate page::private before
calling attach_extent_buffer_page().
And the pre-allocation happens out of the spinlock.
Thus there is no memory allocation at all for that call site.
The other caller is in btrfs_clone_extent_buffer(), which needs proper
memory allocation.
+
+ /* Do new allocation to attach subpage */
+ ret = btrfs_attach_subpage(fs_info, page);
+ if (ret < 0)
+ return ret;
+
+update_bitmap:
+ btrfs_subpage_set_tree_block(fs_info, page, eb->start, eb->len);
+ return 0;
Those are really 2 functions, demarcated by the if. Given that
attach_extent_buffer is called in only 2 places, can't you opencode the
if (fs_info->sectorize) check in the callers and define 2 functions:
1 for subpage blocksize and the other one for the old code?
Tried, looks much worse than current code, especially we need to add one
indent in btrfs_clone_extent_buffer().
}
<snip>
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 96f3b226913e..c2ce603e7848 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -23,9 +23,53 @@
struct btrfs_subpage {
/* Common members for both data and metadata pages */
spinlock_t lock;
+ union {
+ /* Structures only used by metadata */
+ struct {
+ u16 tree_block_bitmap;
+ };
+ /* structures only used by data */
+ };
};
int btrfs_attach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
void btrfs_detach_subpage(struct btrfs_fs_info *fs_info, struct page *page);
+/*
+ * Convert the [start, start + len) range into a u16 bitmap
+ *
+ * E.g. if start == page_offset() + 16K, len = 16K, we get 0x00f0.
+ */
+static inline u16 btrfs_subpage_calc_bitmap(struct btrfs_fs_info *fs_info,
+ struct page *page, u64 start, u32 len)
+{
+ int bit_start = (start - page_offset(page)) >> fs_info->sectorsize_bits;
+ int nbits = len >> fs_info->sectorsize_bits;
+
+ /* Basic checks */
+ ASSERT(PagePrivate(page) && page->private);
+ ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
+ IS_ALIGNED(len, fs_info->sectorsize));
Separate aligns so if they feel it's evident which one failed.
I guess we are going to forget when ASSERT() should be used.
It's for something which shouldn't fail.
It's not used as a less-terrible BUG_ON(), but really to indicate what's
expected, thus I don't really expect it to be triggered, nor would it
matter if it's two lines or one line.
what's your idea on this David?
+ ASSERT(page_offset(page) <= start &&
+ start + len <= page_offset(page) + PAGE_SIZE);
ditto. Also instead of checking 'page_offset(page) <= start' you can
simply check 'bit_start is >= 0' as that's what you ultimately care about.
Despite the ASSERT() usage, the start + len and page_offset() is much
easier to grasp without the need to refer to bit_start.
Thanks,
Qu
+ /*
+ * Here nbits can be 16, thus can go beyond u16 range. Here we make the
+ * first left shift to be calculated in unsigned long (u32), then
+ * truncate the result to u16.
+ */
+ return (u16)(((1UL << nbits) - 1) << bit_start);
+}
+
+static inline void btrfs_subpage_set_tree_block(struct btrfs_fs_info *fs_info,
+ struct page *page, u64 start, u32 len)
+{
+ struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+ unsigned long flags;
+ u16 tmp = btrfs_subpage_calc_bitmap(fs_info, page, start, len);
+
+ spin_lock_irqsave(&subpage->lock, flags);
+ subpage->tree_block_bitmap |= tmp;
+ spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
#endif /* BTRFS_SUBPAGE_H */