On 2021/4/2 下午4:52, Qu Wenruo wrote:


On 2021/4/2 下午4:46, Ritesh Harjani wrote:
On 21/04/02 04:36PM, Qu Wenruo wrote:


On 2021/4/2 下午4:33, Ritesh Harjani wrote:
On 21/03/29 10:01AM, Qu Wenruo wrote:


On 2021/3/29 上午4:02, Ritesh Harjani wrote:
On 21/03/25 09:16PM, Qu Wenruo wrote:


On 2021/3/25 下午8:20, Neal Gompa wrote:
On Thu, Mar 25, 2021 at 3:17 AM Qu Wenruo <w...@suse.com> wrote:

This patchset can be fetched from the following github repo,
along with
the full subpage RW support:
https://github.com/adam900710/linux/tree/subpage

This patchset is for metadata read write support.

[FULL RW TEST]
Since the data write path is not included in this patchset, we
can't
really test the patchset itself, but anyone can grab the patch
from
github repo and do fstests/generic tests.

But at least the full RW patchset can pass -g generic/quick -x
defrag
for now.

There are some known issues:

- Defrag behavior change
      Since current defrag is doing per-page defrag, to support
subpage
      defrag, we need some change in the loop.
      E.g. if a page has both hole and regular extents in it,
then defrag
      will rewrite the full 64K page.

      Thus for now, defrag related failure is expected.
      But this should only cause behavior difference, no crash
nor hang is
      expected.

- No compression support yet
      There are at least 2 known bugs if forcing compression
for subpage
      * Some hard coded PAGE_SIZE screwing up space rsv
      * Subpage ASSERT() triggered
        This is because some compression code is unlocking
locked_page by
        calling extent_clear_unlock_delalloc() with locked_page
== NULL.
      So for now compression is also disabled.

- Inode nbytes mismatch
      Still debugging.
      The fastest way to trigger is fsx using the following
parameters:

        fsx -l 262144 -o 65536 -S 30073 -N 256 -R -W $mnt/file
> /tmp/fsx

      Which would cause inode nbytes differs from expected
value and
      triggers btrfs check error.

[DIFFERENCE AGAINST REGULAR SECTORSIZE]
The metadata part in fact has more new code than data part, as
ithas
some different behaviors compared to the regular sector size
handling:

- No more page locking
      Now metadata read/write relies on extent io tree locking,
other than
      page locking.
      This is to allow behaviors like read lock one eb while
alsotry to
      read lock another eb in the same page.
      We can't rely on page lock as now we have multiple extent
buffers in
      the same page.

- Page status update
      Now we use subpage wrappers to handle page status update.

- How to submit dirty extent buffers
      Instead of just grabbing extent buffer from
page::private, we need to
      iterate all dirty extent buffers in the page and submit
them.

[CHANGELOG]
v2:
- Rebased to latest misc-next
      No conflicts at all.

- Add new sysfs interface to grab supported RO/RW sectorsize
      This will allow mkfs.btrfs to detect unmountable fs better.

- Use newer naming schema for each patch
      No more "extent_io:" or "inode:" schema anymore.

- Move two pure cleanups to the series
      Patch 2~3, originally in RW part.

- Fix one uninitialized variable
      Patch 6.

v3:
- Rename the sysfs to supported_sectorsizes

- Rebased to latest misc-next branch
      This removes 2 cleanup patches.

- Add new overview comment for subpage metadata

Qu Wenruo (13):
      btrfs: add sysfs interface for supported sectorsize
      btrfs: use min() to replace open-code in
btrfs_invalidatepage()
      btrfs: remove unnecessary variable shadowing in
btrfs_invalidatepage()
      btrfs: refactor how we iterate ordered extent in
        btrfs_invalidatepage()
      btrfs: introduce helpers for subpage dirty status
      btrfs: introduce helpers for subpage writeback status
      btrfs: allow btree_set_page_dirty() to do more sanity
checkon subpage
        metadata
      btrfs: support subpage metadata csum calculation at write
time
      btrfs: make alloc_extent_buffer() check subpage dirty bitmap
      btrfs: make the page uptodate assert to be subpage
compatible
      btrfs: make set/clear_extent_buffer_dirty() to be subpage
compatible
      btrfs: make set_btree_ioerr() accept extent buffer and to
be subpage
        compatible
      btrfs: add subpage overview comments

     fs/btrfs/disk-io.c   | 143
++++++++++++++++++++++++++++++++++---------
     fs/btrfs/extent_io.c | 127
++++++++++++++++++++++++++++----------
     fs/btrfs/inode.c     | 128
++++++++++++++++++++++----------------
     fs/btrfs/subpage.c   | 127
++++++++++++++++++++++++++++++++++++++
     fs/btrfs/subpage.h   |  17 +++++
     fs/btrfs/sysfs.c     |  15 +++++
     6 files changed, 441 insertions(+), 116 deletions(-)

--
2.30.1


Why wouldn't we just integrate full read-write support with the
caveats as described now? It seems to be relatively reasonable
to do
that, and this patch set is essentially unusable without the
rest of
it that does enable full read-write support.

The metadata part is much more stable than data path (almost not
touched
for several months), and the metadata part already has some
difference
in its behavior, which needs review.

You point makes some sense, but I still don't believe pushing a
super
large patchset does any help for the review.

If you want to test, you can grab the branch from the github repo.
If you want to review, the mails are all here for review.

In fact, we used to have subpage support sent as a big patchset
from IBM
guys, but the result is only some preparation patches get merged,
and
nothing more.

Using this multi-series method, we're already doing better work and
received more testing (to ensure regular sectorsize is not
affectedat
least).

Hi Qu Wenruo,

Sorry about chiming in late on this. I don't have any strong
objection on either
approach. Although sometime back when I tested your RW support git
tree on
Power, the unmount patch itself was crashing. I didn't debug it
thattime
(this was a month back or so), so I also didn't bother testing
xfstests on Power.

But we do have an interest in making sure this patch series work
on bs < ps
on Power platform. I can try helping with testing, reviewing (to
best of my
knowledge) and fixing anything is possible :)

That's great!

One of my biggest problem here is, I don't have good enough testing
environment.

Although SUSE has internal clouds for ARM64/PPC64, but due to the
f**king Great Firewall, it's super slow to access, no to mention doing
proper debugging.

Currently I'm using two ARM SBCs, RK3399 and A311D based, to do the
test.
But their computing power is far from ideal, only generic/quick can
finish in hours.

Thus real world Power could definitely help.

Let me try and pull your tree and test it on Power. Please let me
know if there
is anything needs to be taken care apart from your github tree and
btrfs-progs
branch with bs < ps support.

If you're going to test the branch, here are some small notes:

- Need to use latest btrfs-progs
    As it fixes a false alert on crossing 64K page boundary.

- Need to slightly modify btrfs-progs to avoid false alerts
    For subpage case, mkfs.btrfs will output a warning, but that
warning
    is outputted into stderr, which will screw up generic test groups.
    It's recommended to apply the following diff:

diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index 569208a9..21976554 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -341,8 +341,8 @@ int btrfs_check_sectorsize(u32 sectorsize)
                  return -EINVAL;
          }
          if (page_size != sectorsize)
-               warning(
-"the filesystem may not be mountable, sectorsize %u doesn't match
page
size %u",
+               printf(
+"the filesystem may not be mountable, sectorsize %u doesn't match
page
size %u\n",
                          sectorsize, page_size);
          return 0;
   }

- Xfstest/btrfs group will crash at btrfs/143
    Still investigating, but you can ignore btrfs group for now.

- Very rare hang
    There is a very low change to hang, with "bad ordered accounting"
    dmesg.
    If you can hit, please let me know.
    I had something idea to fix it, but not yet in the branch.

- btrfs inode nbytes mismatch
    Investigating, as it will make btrfs-check to report error.

The last two bugs are the final show blocker, I'll give you extra
updates when those are fixed.

Thanks Qu Wenruo, for above info.
I cloned below git tree as mentioned in your git log to test for RW
onPower.
However, I still see that RW mount for bs < ps is disabled for in
open_ctree()
https://github.com/adam900710/linux/tree/subpage

I see below code present in this tree.
           /* For 4K sector size support, it's only read-only */
           if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
                   if (!sb_rdonly(sb) ||
btrfs_super_log_root(disk_super)) {
                           btrfs_err(fs_info,
           "subpage sectorsize %u only supported read-only for page
size %lu",
                                   sectorsize, PAGE_SIZE);
                           err = -EINVAL;
                           goto fail_alloc;
                   }
           }

Could you pls point me to the tree I can use for bs < ps testing on
Power?
Sorry if I missed something.

Sorry, I updated the branch to my current development progress, it's now
at the ordered extent rework part, without the remaining subpage
functionality at all.

You may want to grab this tree instead:
https://github.com/adam900710/linux/tree/subpage_old

But please keep in mind that, you may get random hang, and certain
generic test case, especially generic/075 can corrupt the inode nbytes
and leaving all later test cases using TEST_DEV to report error on fsck.


Thanks for quick response. Sure, I will exclude generic/075 from the test
for now.

Not only generic/075, but all tests running fsx may cause inode nbytes
corruption.

Thus I'd recommend either modify btrfs-check to ignore it, or re-mkfs on
TEST_DEV after each test case.

Good news, you can fetch the subpage branch for better test results.

Now the branch should pass all generic tests, except defrag and known
failures.
And no more random crash during the tests.

And for btrfs/143, it will no longer trigger a BUG_ON(), although at the
cost of worse granularity for repair.
(Now it's per-bvec repair, not yet fully per-sector repair).

I'll rebase the branch in recent days to latest misc-next, but the
current branch is already good enough for full subapge RW support.

Thanks,
Qu

Thanks,
Qu


-ritesh

Reply via email to