On Fri, 13 Oct 2023 at 19:06, Brian Foster <[email protected]> wrote:
> On Tue, Oct 10, 2023 at 11:20:59PM +0800, Coly Li wrote:
> > Forward it to [email protected] 
> > <mailto:[email protected]>
> >
> > > 2023年10月10日 07:43,Daniel J Blueman <[email protected]> 写道:
> > >
> > > Firstly, bcachefs introduces a new era of in-tree filesystems with
> > > some monumental features (sorry, ZFS); hats off to Kent for landing
> > > this!
> > >
> > > My testing finds it is in great shape; far better than BTRFS was when
> > > it landed. Testing on linux next-20231005 with additional debug checks
> > > atop the Ubuntu 23.04 kernel generic config [1], I was able to provoke
> > > a btree trans path overflow cornercase [2].
>
> Thanks for testing and for the bug report. I'm curious what kind of
> testing you've been doing in general and perhaps if you have any more
> interesting results to share? ;)

Thanks for asking Brian! My validation seeks to exercise edge cases by
triggering multiple conditions in parallel, then working backwards to
identify a minimal and reliable reproducer.

I'll prepare a post with additional early findings in the coming days.

> > > The minimal reproducer is:
> > > # modprobe brd rd_nr=2 rd_size=1048576
> > > # bcachefs format --metadata_replicas=2 --label=tier1.1 /dev/ram0
> > > --label=tier1.2 /dev/ram1
> > > # mount -t bcachefs /dev/ram0:/dev/ram1 /mnt
> > > # dd if=/dev/zero of=/mnt/test bs=128M
> > >
> > > The issue doesn't reproduce with metadata_replicas=1 or a single block 
> > > device.
>
> My naive assumption would be that the higher replica count increases the
> size of some of these transactions due to having to update multiple
> devices, but this is not an area I've dug into in depth tbh. I've not
> seen this yet myself, but I think most of the multi device testing I've
> done so far has still been with replicas=1.

With the stock Ubuntu kernel config (no CONFIG_LOCKDEP, likely similar
to your codebase), my testing eventually provoked a crash; I'll see if
I can reproduce this later in case it's the same path as this report
with CONFIG_LOCKDEP.

> > > At debug entry, I couldn't determine why BTREE_ITER_MAX must be 64
> > > rather than 32 when CONFIG_LOCKDEP is set, however the panic doesn't
> > > occur without CONFIG_LOCKDEP, so it appears related; keeping it at
> > > value 32 with CONFIG_LOCKDEP doesn't prevent the panic also.
> > >
> > > @Kent/anyone?
>
> Hmm, that is interesting. I'm not sure why lockdep would be a factor
> here. Where did you come up with the notion that ITER_MAX could be 32
> (and the idea to test it) with CONFIG_LOCKDEP=y? It looks like it's
> hardcoded to 64, but perhaps I'm missing something.

See 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/fs/bcachefs/btree_types.h#n390

#ifndef CONFIG_LOCKDEP
#define BTREE_ITER_MAX 64
#else
#define BTREE_ITER_MAX 32
#endif

> We'll see if Kent can make more sense of the crash signature whenever
> he's back around. Otherwise I'll make a note to try and reproduce it
> once I work through some other things...

It would be good to get some independent confirmation on this.

Thanks again!
  Dan
-- 
Daniel J Blueman

Reply via email to