On Fri, 13 Oct 2023 at 19:06, Brian Foster <[email protected]> wrote: > On Tue, Oct 10, 2023 at 11:20:59PM +0800, Coly Li wrote: > > Forward it to [email protected] > > <mailto:[email protected]> > > > > > 2023年10月10日 07:43,Daniel J Blueman <[email protected]> 写道: > > > > > > Firstly, bcachefs introduces a new era of in-tree filesystems with > > > some monumental features (sorry, ZFS); hats off to Kent for landing > > > this! > > > > > > My testing finds it is in great shape; far better than BTRFS was when > > > it landed. Testing on linux next-20231005 with additional debug checks > > > atop the Ubuntu 23.04 kernel generic config [1], I was able to provoke > > > a btree trans path overflow cornercase [2]. > > Thanks for testing and for the bug report. I'm curious what kind of > testing you've been doing in general and perhaps if you have any more > interesting results to share? ;)
Thanks for asking Brian! My validation seeks to exercise edge cases by triggering multiple conditions in parallel, then working backwards to identify a minimal and reliable reproducer. I'll prepare a post with additional early findings in the coming days. > > > The minimal reproducer is: > > > # modprobe brd rd_nr=2 rd_size=1048576 > > > # bcachefs format --metadata_replicas=2 --label=tier1.1 /dev/ram0 > > > --label=tier1.2 /dev/ram1 > > > # mount -t bcachefs /dev/ram0:/dev/ram1 /mnt > > > # dd if=/dev/zero of=/mnt/test bs=128M > > > > > > The issue doesn't reproduce with metadata_replicas=1 or a single block > > > device. > > My naive assumption would be that the higher replica count increases the > size of some of these transactions due to having to update multiple > devices, but this is not an area I've dug into in depth tbh. I've not > seen this yet myself, but I think most of the multi device testing I've > done so far has still been with replicas=1. With the stock Ubuntu kernel config (no CONFIG_LOCKDEP, likely similar to your codebase), my testing eventually provoked a crash; I'll see if I can reproduce this later in case it's the same path as this report with CONFIG_LOCKDEP. > > > At debug entry, I couldn't determine why BTREE_ITER_MAX must be 64 > > > rather than 32 when CONFIG_LOCKDEP is set, however the panic doesn't > > > occur without CONFIG_LOCKDEP, so it appears related; keeping it at > > > value 32 with CONFIG_LOCKDEP doesn't prevent the panic also. > > > > > > @Kent/anyone? > > Hmm, that is interesting. I'm not sure why lockdep would be a factor > here. Where did you come up with the notion that ITER_MAX could be 32 > (and the idea to test it) with CONFIG_LOCKDEP=y? It looks like it's > hardcoded to 64, but perhaps I'm missing something. See https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/fs/bcachefs/btree_types.h#n390 #ifndef CONFIG_LOCKDEP #define BTREE_ITER_MAX 64 #else #define BTREE_ITER_MAX 32 #endif > We'll see if Kent can make more sense of the crash signature whenever > he's back around. Otherwise I'll make a note to try and reproduce it > once I work through some other things... It would be good to get some independent confirmation on this. Thanks again! Dan -- Daniel J Blueman
