On 21/04/12 07:33PM, Qu Wenruo wrote: > > > On 2021/4/2 下午4:52, Qu Wenruo wrote: > > > > > > On 2021/4/2 下午4:46, Ritesh Harjani wrote: > > > On 21/04/02 04:36PM, Qu Wenruo wrote: > > > > > > > > > > > > On 2021/4/2 下午4:33, Ritesh Harjani wrote: > > > > > On 21/03/29 10:01AM, Qu Wenruo wrote: > > > > > > > > > > > > > > > > > > On 2021/3/29 上午4:02, Ritesh Harjani wrote: > > > > > > > On 21/03/25 09:16PM, Qu Wenruo wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 2021/3/25 下午8:20, Neal Gompa wrote: > > > > > > > > > On Thu, Mar 25, 2021 at 3:17 AM Qu Wenruo <w...@suse.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > This patchset can be fetched from the following github repo, > > > > > > > > > > along with > > > > > > > > > > the full subpage RW support: > > > > > > > > > > https://github.com/adam900710/linux/tree/subpage > > > > > > > > > > > > > > > > > > > > This patchset is for metadata read write support. > > > > > > > > > > > > > > > > > > > > [FULL RW TEST] > > > > > > > > > > Since the data write path is not included in this patchset, > > > > > > > > > > we > > > > > > > > > > can't > > > > > > > > > > really test the patchset itself, but anyone can grab the > > > > > > > > > > patch > > > > > > > > > > from > > > > > > > > > > github repo and do fstests/generic tests. > > > > > > > > > > > > > > > > > > > > But at least the full RW patchset can pass -g generic/quick > > > > > > > > > > -x > > > > > > > > > > defrag > > > > > > > > > > for now. > > > > > > > > > > > > > > > > > > > > There are some known issues: > > > > > > > > > > > > > > > > > > > > - Defrag behavior change > > > > > > > > > > Since current defrag is doing per-page defrag, to > > > > > > > > > > support > > > > > > > > > > subpage > > > > > > > > > > defrag, we need some change in the loop. > > > > > > > > > > E.g. if a page has both hole and regular extents in > > > > > > > > > > it, > > > > > > > > > > then defrag > > > > > > > > > > will rewrite the full 64K page. > > > > > > > > > > > > > > > > > > > > Thus for now, defrag related failure is expected. > > > > > > > > > > But this should only cause behavior difference, no > > > > > > > > > > crash > > > > > > > > > > nor hang is > > > > > > > > > > expected. > > > > > > > > > > > > > > > > > > > > - No compression support yet > > > > > > > > > > There are at least 2 known bugs if forcing compression > > > > > > > > > > for subpage > > > > > > > > > > * Some hard coded PAGE_SIZE screwing up space rsv > > > > > > > > > > * Subpage ASSERT() triggered > > > > > > > > > > This is because some compression code is unlocking > > > > > > > > > > locked_page by > > > > > > > > > > calling extent_clear_unlock_delalloc() with > > > > > > > > > > locked_page > > > > > > > > > > == NULL. > > > > > > > > > > So for now compression is also disabled. > > > > > > > > > > > > > > > > > > > > - Inode nbytes mismatch > > > > > > > > > > Still debugging. > > > > > > > > > > The fastest way to trigger is fsx using the following > > > > > > > > > > parameters: > > > > > > > > > > > > > > > > > > > > fsx -l 262144 -o 65536 -S 30073 -N 256 -R -W > > > > > > > > > > $mnt/file > > > > > > > > > > > /tmp/fsx > > > > > > > > > > > > > > > > > > > > Which would cause inode nbytes differs from expected > > > > > > > > > > value and > > > > > > > > > > triggers btrfs check error. > > > > > > > > > > > > > > > > > > > > [DIFFERENCE AGAINST REGULAR SECTORSIZE] > > > > > > > > > > The metadata part in fact has more new code than data part, > > > > > > > > > > as > > > > > > > > > > ithas > > > > > > > > > > some different behaviors compared to the regular sector size > > > > > > > > > > handling: > > > > > > > > > > > > > > > > > > > > - No more page locking > > > > > > > > > > Now metadata read/write relies on extent io tree > > > > > > > > > > locking, > > > > > > > > > > other than > > > > > > > > > > page locking. > > > > > > > > > > This is to allow behaviors like read lock one eb while > > > > > > > > > > alsotry to > > > > > > > > > > read lock another eb in the same page. > > > > > > > > > > We can't rely on page lock as now we have multiple > > > > > > > > > > extent > > > > > > > > > > buffers in > > > > > > > > > > the same page. > > > > > > > > > > > > > > > > > > > > - Page status update > > > > > > > > > > Now we use subpage wrappers to handle page status > > > > > > > > > > update. > > > > > > > > > > > > > > > > > > > > - How to submit dirty extent buffers > > > > > > > > > > Instead of just grabbing extent buffer from > > > > > > > > > > page::private, we need to > > > > > > > > > > iterate all dirty extent buffers in the page and > > > > > > > > > > submit > > > > > > > > > > them. > > > > > > > > > > > > > > > > > > > > [CHANGELOG] > > > > > > > > > > v2: > > > > > > > > > > - Rebased to latest misc-next > > > > > > > > > > No conflicts at all. > > > > > > > > > > > > > > > > > > > > - Add new sysfs interface to grab supported RO/RW sectorsize > > > > > > > > > > This will allow mkfs.btrfs to detect unmountable fs > > > > > > > > > > better. > > > > > > > > > > > > > > > > > > > > - Use newer naming schema for each patch > > > > > > > > > > No more "extent_io:" or "inode:" schema anymore. > > > > > > > > > > > > > > > > > > > > - Move two pure cleanups to the series > > > > > > > > > > Patch 2~3, originally in RW part. > > > > > > > > > > > > > > > > > > > > - Fix one uninitialized variable > > > > > > > > > > Patch 6. > > > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > > - Rename the sysfs to supported_sectorsizes > > > > > > > > > > > > > > > > > > > > - Rebased to latest misc-next branch > > > > > > > > > > This removes 2 cleanup patches. > > > > > > > > > > > > > > > > > > > > - Add new overview comment for subpage metadata > > > > > > > > > > > > > > > > > > > > Qu Wenruo (13): > > > > > > > > > > btrfs: add sysfs interface for supported sectorsize > > > > > > > > > > btrfs: use min() to replace open-code in > > > > > > > > > > btrfs_invalidatepage() > > > > > > > > > > btrfs: remove unnecessary variable shadowing in > > > > > > > > > > btrfs_invalidatepage() > > > > > > > > > > btrfs: refactor how we iterate ordered extent in > > > > > > > > > > btrfs_invalidatepage() > > > > > > > > > > btrfs: introduce helpers for subpage dirty status > > > > > > > > > > btrfs: introduce helpers for subpage writeback status > > > > > > > > > > btrfs: allow btree_set_page_dirty() to do more sanity > > > > > > > > > > checkon subpage > > > > > > > > > > metadata > > > > > > > > > > btrfs: support subpage metadata csum calculation at > > > > > > > > > > write > > > > > > > > > > time > > > > > > > > > > btrfs: make alloc_extent_buffer() check subpage dirty > > > > > > > > > > bitmap > > > > > > > > > > btrfs: make the page uptodate assert to be subpage > > > > > > > > > > compatible > > > > > > > > > > btrfs: make set/clear_extent_buffer_dirty() to be > > > > > > > > > > subpage > > > > > > > > > > compatible > > > > > > > > > > btrfs: make set_btree_ioerr() accept extent buffer > > > > > > > > > > and to > > > > > > > > > > be subpage > > > > > > > > > > compatible > > > > > > > > > > btrfs: add subpage overview comments > > > > > > > > > > > > > > > > > > > > fs/btrfs/disk-io.c | 143 > > > > > > > > > > ++++++++++++++++++++++++++++++++++--------- > > > > > > > > > > fs/btrfs/extent_io.c | 127 > > > > > > > > > > ++++++++++++++++++++++++++++---------- > > > > > > > > > > fs/btrfs/inode.c | 128 > > > > > > > > > > ++++++++++++++++++++++---------------- > > > > > > > > > > fs/btrfs/subpage.c | 127 > > > > > > > > > > ++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > fs/btrfs/subpage.h | 17 +++++ > > > > > > > > > > fs/btrfs/sysfs.c | 15 +++++ > > > > > > > > > > 6 files changed, 441 insertions(+), 116 deletions(-) > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > 2.30.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why wouldn't we just integrate full read-write support with > > > > > > > > > the > > > > > > > > > caveats as described now? It seems to be relatively reasonable > > > > > > > > > to do > > > > > > > > > that, and this patch set is essentially unusable without the > > > > > > > > > rest of > > > > > > > > > it that does enable full read-write support. > > > > > > > > > > > > > > > > The metadata part is much more stable than data path (almost not > > > > > > > > touched > > > > > > > > for several months), and the metadata part already has some > > > > > > > > difference > > > > > > > > in its behavior, which needs review. > > > > > > > > > > > > > > > > You point makes some sense, but I still don't believe pushing a > > > > > > > > super > > > > > > > > large patchset does any help for the review. > > > > > > > > > > > > > > > > If you want to test, you can grab the branch from the github > > > > > > > > repo. > > > > > > > > If you want to review, the mails are all here for review. > > > > > > > > > > > > > > > > In fact, we used to have subpage support sent as a big patchset > > > > > > > > from IBM > > > > > > > > guys, but the result is only some preparation patches get > > > > > > > > merged, > > > > > > > > and > > > > > > > > nothing more. > > > > > > > > > > > > > > > > Using this multi-series method, we're already doing better work > > > > > > > > and > > > > > > > > received more testing (to ensure regular sectorsize is not > > > > > > > > affectedat > > > > > > > > least). > > > > > > > > > > > > > > Hi Qu Wenruo, > > > > > > > > > > > > > > Sorry about chiming in late on this. I don't have any strong > > > > > > > objection on either > > > > > > > approach. Although sometime back when I tested your RW support git > > > > > > > tree on > > > > > > > Power, the unmount patch itself was crashing. I didn't debug it > > > > > > > thattime > > > > > > > (this was a month back or so), so I also didn't bother testing > > > > > > > xfstests on Power. > > > > > > > > > > > > > > But we do have an interest in making sure this patch series work > > > > > > > on bs < ps > > > > > > > on Power platform. I can try helping with testing, reviewing (to > > > > > > > best of my > > > > > > > knowledge) and fixing anything is possible :) > > > > > > > > > > > > That's great! > > > > > > > > > > > > One of my biggest problem here is, I don't have good enough testing > > > > > > environment. > > > > > > > > > > > > Although SUSE has internal clouds for ARM64/PPC64, but due to the > > > > > > f**king Great Firewall, it's super slow to access, no to mention > > > > > > doing > > > > > > proper debugging. > > > > > > > > > > > > Currently I'm using two ARM SBCs, RK3399 and A311D based, to do the > > > > > > test. > > > > > > But their computing power is far from ideal, only generic/quick can > > > > > > finish in hours. > > > > > > > > > > > > Thus real world Power could definitely help. > > > > > > > > > > > > > > Let me try and pull your tree and test it on Power. Please let me > > > > > > > know if there > > > > > > > is anything needs to be taken care apart from your github tree and > > > > > > > btrfs-progs > > > > > > > branch with bs < ps support. > > > > > > > > > > > > If you're going to test the branch, here are some small notes: > > > > > > > > > > > > - Need to use latest btrfs-progs > > > > > > As it fixes a false alert on crossing 64K page boundary. > > > > > > > > > > > > - Need to slightly modify btrfs-progs to avoid false alerts > > > > > > For subpage case, mkfs.btrfs will output a warning, but that > > > > > > warning > > > > > > is outputted into stderr, which will screw up generic test > > > > > > groups. > > > > > > It's recommended to apply the following diff: > > > > > > > > > > > > diff --git a/common/fsfeatures.c b/common/fsfeatures.c > > > > > > index 569208a9..21976554 100644 > > > > > > --- a/common/fsfeatures.c > > > > > > +++ b/common/fsfeatures.c > > > > > > @@ -341,8 +341,8 @@ int btrfs_check_sectorsize(u32 sectorsize) > > > > > > return -EINVAL; > > > > > > } > > > > > > if (page_size != sectorsize) > > > > > > - warning( > > > > > > -"the filesystem may not be mountable, sectorsize %u doesn't match > > > > > > page > > > > > > size %u", > > > > > > + printf( > > > > > > +"the filesystem may not be mountable, sectorsize %u doesn't match > > > > > > page > > > > > > size %u\n", > > > > > > sectorsize, page_size); > > > > > > return 0; > > > > > > } > > > > > > > > > > > > - Xfstest/btrfs group will crash at btrfs/143 > > > > > > Still investigating, but you can ignore btrfs group for now. > > > > > > > > > > > > - Very rare hang > > > > > > There is a very low change to hang, with "bad ordered > > > > > > accounting" > > > > > > dmesg. > > > > > > If you can hit, please let me know. > > > > > > I had something idea to fix it, but not yet in the branch. > > > > > > > > > > > > - btrfs inode nbytes mismatch > > > > > > Investigating, as it will make btrfs-check to report error. > > > > > > > > > > > > The last two bugs are the final show blocker, I'll give you extra > > > > > > updates when those are fixed. > > > > > > > > > > Thanks Qu Wenruo, for above info. > > > > > I cloned below git tree as mentioned in your git log to test for RW > > > > > onPower. > > > > > However, I still see that RW mount for bs < ps is disabled for in > > > > > open_ctree() > > > > > https://github.com/adam900710/linux/tree/subpage > > > > > > > > > > I see below code present in this tree. > > > > > /* For 4K sector size support, it's only read-only */ > > > > > if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) { > > > > > if (!sb_rdonly(sb) || > > > > > btrfs_super_log_root(disk_super)) { > > > > > btrfs_err(fs_info, > > > > > "subpage sectorsize %u only supported read-only for page > > > > > size %lu", > > > > > sectorsize, PAGE_SIZE); > > > > > err = -EINVAL; > > > > > goto fail_alloc; > > > > > } > > > > > } > > > > > > > > > > Could you pls point me to the tree I can use for bs < ps testing on > > > > > Power? > > > > > Sorry if I missed something. > > > > > > > > Sorry, I updated the branch to my current development progress, it's now > > > > at the ordered extent rework part, without the remaining subpage > > > > functionality at all. > > > > > > > > You may want to grab this tree instead: > > > > https://github.com/adam900710/linux/tree/subpage_old > > > > > > > > But please keep in mind that, you may get random hang, and certain > > > > generic test case, especially generic/075 can corrupt the inode nbytes > > > > and leaving all later test cases using TEST_DEV to report error on fsck. > > > > > > > > > > Thanks for quick response. Sure, I will exclude generic/075 from the test > > > for now. > > > > Not only generic/075, but all tests running fsx may cause inode nbytes > > corruption. > > > > Thus I'd recommend either modify btrfs-check to ignore it, or re-mkfs on > > TEST_DEV after each test case. > > Good news, you can fetch the subpage branch for better test results. > > Now the branch should pass all generic tests, except defrag and known > failures. > And no more random crash during the tests.
Thanks, let me test it on PPC64 box. -ritesh > > And for btrfs/143, it will no longer trigger a BUG_ON(), although at the > cost of worse granularity for repair. > (Now it's per-bvec repair, not yet fully per-sector repair). > > I'll rebase the branch in recent days to latest misc-next, but the > current branch is already good enough for full subapge RW support. > > Thanks, > Qu > > > > Thanks, > > Qu > > > > > > > > -ritesh > > >