On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjen...@rkjnsn.net> wrote: > > The offending system is indeed ARMv7 (specifically a Marvell ARMADA® > 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is > actually ARMv6 (with hardware float support).
Using NBD, I have verified that I receive the same error when attempting to mount the filesystem on my ARMv6 Raspberry Pi: [ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled [ 3491.394584] BTRFS info (device dm-4): has skinny extents [ 3492.385095] BTRFS error (device dm-4): bad tree block start, want 26207780683776 have 3395945502747707095 [ 3492.514071] BTRFS error (device dm-4): bad tree block start, want 26207780683776 have 3395945502747707095 [ 3492.553599] BTRFS warning (device dm-4): failed to read tree root [ 3492.865368] BTRFS error (device dm-4): open_ctree failed The Raspberry Pi is running Linux 5.4.83. > On Mon, Jan 18, 2021 at 4:01 AM Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > > > > > > On 2021/1/18 下午7:55, Erik Jensen wrote: > > > On Mon, Jan 18, 2021 at 3:07 AM Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > >> On 2021/1/18 下午6:33, Erik Jensen wrote: > > >>> I ended up having other priorities occupying my time since 2019, and the > > >>> "solution" of exporting the individual drives on my NAS using NBD and > > >>> mounting them on my desktop worked, even if it wasn't pretty. > > >>> > > >>> However, I am currently looking into Syncthing, which I would like to > > >>> run on the NAS directly. That would, of course, require accessing the > > >>> filesystem directly on the NAS rather than just exporting the raw > > >>> devices, which means circling back to this issue. > > >>> > > >>> After updating my NAS, I have determined that the issue still occurs > > >>> with Linux 5.8. > > >>> > > >>> What's the next best step for debugging the issue? Ideally, I'd like to > > >>> help track down the issue to find a proper fix, rather than just trying > > >>> to bypass the issue. I wasn't sure if the suggestion to comment out > > >>> btrfs_verify_dev_extents() was more geared toward the former or the > > >>> latter. > > >> > > >> After rewinding my memory on this case, the problem is really that the > > >> ARM btrfs kernel is reading garbage, while X86 or ARM user space tool > > >> works as expected. > > >> > > >> Can you recompile your kernel on the ARM board to add extra debugging > > >> messages? > > >> If possible, we can try to add some extra debug points to bombarding > > >> your dmesg. > > >> > > >> Or do you have other ARM boards to test the same fs? > > >> > > >> > > >> Thanks, > > >> Qu > > > > > > It's pretty easy to build a kernel with custom patches applied, though > > > the actual building takes a while, so I'd be happy to add whatever > > > debug messages would be useful. I also have an old Raspberry Pi > > > (original model B) I can dig out and try to get going, tomorrow. I > > > can't hook it up to the drives directly, but I should be able to > > > access them via NBD like I was doing from my desktop. > > > > RPI 1B would be a little slow but should be enough to expose the > > problem, if the problem is for all arm builds (as long as you're also > > using armv7 for the offending system). > > > > Thanks, > > Qu > > > > > If I can't get > > > that going for whatever reason, I could also try running an emulated > > > ARM system with QEMU. > > > > > >>> > > >>> On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.bt...@gmx.com > > >>> <mailto:quwenruo.bt...@gmx.com>> wrote: > > >>> > > >>> > > >>> > > >>> On 2019/6/28 下午4:00, Erik Jensen wrote: > > >>> >> So it's either the block layer reading some wrong from the disk > > >>> or btrfs > > >>> >> layer doesn't do correct endian convert. > > >>> > > > >>> > My ARM board is running in little endian mode, so it doesn't > > >>> seem > > >>> like > > >>> > endianness should be an issue. (It is 32-bits versus my > > >>> desktop's 64, > > >>> > though.) I've also tried exporting the drives via NBD to my > > >>> x86_64 > > >>> > system, and that worked fine, so if the problem is under btrfs, > > >>> it > > >>> > would have to be in the encryption layer, but fsck succeeding > > >>> on the > > >>> > ARM board would seem to rule that out, as well. > > >>> > > > >>> >> Would you dump the following data (X86 and ARM should output > > >>> the > > >>> same > > >>> >> content, thus one output is enough). > > >>> >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3 > > >>> >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3 > > >>> > > > >>> > Attached, and also 17628705964032, since that's the block > > >>> mentioned in > > >>> > my most recent mount attempt (see below). > > >>> > > >>> The trees are completely fine. > > >>> > > >>> So it should be something else causing the problem. > > >>> > > >>> > > > >>> >> And then, for the ARM system, please apply the following diff, > > >>> and try > > >>> >> mount again. > > >>> >> The diff adds extra debug info, to exam the vital members of a > > >>> tree block. > > >>> >> > > >>> >> Correct fs should output something like: > > >>> >> BTRFS error (device dm-4): bad tree block start, want > > >>> 30408704 > > >>> have 0 > > >>> >> tree block gen=4 owner=5 nritems=2 level=0 > > >>> >> csum: > > >>> >> > > >>> > > >>> a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000 > > >>> >> > > >>> >> The csum one is the most important one, if there aren't so many > > >>> zeros, > > >>> >> it means at that timing, btrfs just got a bunch of garbage, > > >>> thus we > > >>> >> could do further debug. > > >>> > > > >>> > [ 131.725573] BTRFS info (device dm-1): disk space caching is > > >>> enabled > > >>> > [ 131.731884] BTRFS info (device dm-1): has skinny extents > > >>> > [ 133.046145] BTRFS error (device dm-1): bad tree block start, > > >>> want > > >>> > 17628705964032 have 2807793151171243621 > > >>> > [ 133.055775] tree block gen=7888986126946982446 > > >>> > owner=11331573954727661546 nritems=4191910623 level=112 > > >>> > [ 133.065661] csum: > > >>> > > > >>> > > >>> 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc > > >>> > > >>> Completely garbage here, so I'd say the data we got isn't what we > > >>> want. > > >>> > > >>> > [ 133.108383] BTRFS error (device dm-1): bad tree block start, > > >>> want > > >>> > 17628705964032 have 2807793151171243621 > > >>> > [ 133.117999] tree block gen=7888986126946982446 > > >>> > owner=11331573954727661546 nritems=4191910623 level=112 > > >>> > [ 133.127756] csum: > > >>> > > > >>> > > >>> 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc > > >>> > > >>> But strangely, the 2nd try still gives us the same result, if it's > > >>> really some garbage, we should get some different result. > > >>> > > >>> > [ 133.136241] BTRFS error (device dm-1): failed to verify dev > > >>> extents > > >>> > against chunks: -5 > > >>> > > >>> You can try to skip the dev extents verification by commenting out > > >>> the > > >>> btrfs_verify_dev_extents() call in disk-io.c::open_ctree(). > > >>> > > >>> It may fail at another location though. > > >>> > > >>> The more strange part is, we have the device tree root node read > > >>> out > > >>> without problem. > > >>> > > >>> Thanks, > > >>> Qu > > >>> > > >>> > [ 133.166165] BTRFS error (device dm-1): open_ctree failed > > >>> > > > >>> > I copied some files over last time I had it mounted on my > > >>> desktop, > > >>> > which may be why it's now failing at a different block. > > >>> > > > >>> > Thanks! > > >>> > > > >>>