On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen <erikjen...@rkjnsn.net> wrote:
>
> The offending system is indeed ARMv7 (specifically a Marvell ARMADA®
> 388), but I believe the Broadcom BCM2835 in my Raspberry Pi is
> actually ARMv6 (with hardware float support).

Using NBD, I have verified that I receive the same error when
attempting to mount the filesystem on my ARMv6 Raspberry Pi:
[ 3491.339572] BTRFS info (device dm-4): disk space caching is enabled
[ 3491.394584] BTRFS info (device dm-4): has skinny extents
[ 3492.385095] BTRFS error (device dm-4): bad tree block start, want
26207780683776 have 3395945502747707095
[ 3492.514071] BTRFS error (device dm-4): bad tree block start, want
26207780683776 have 3395945502747707095
[ 3492.553599] BTRFS warning (device dm-4): failed to read tree root
[ 3492.865368] BTRFS error (device dm-4): open_ctree failed

The Raspberry Pi is running Linux 5.4.83.

> On Mon, Jan 18, 2021 at 4:01 AM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
> >
> >
> >
> > On 2021/1/18 下午7:55, Erik Jensen wrote:
> > > On Mon, Jan 18, 2021 at 3:07 AM Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
> > >> On 2021/1/18 下午6:33, Erik Jensen wrote:
> > >>> I ended up having other priorities occupying my time since 2019, and the
> > >>> "solution" of exporting the individual drives on my NAS using NBD and
> > >>> mounting them on my desktop worked, even if it wasn't pretty.
> > >>>
> > >>> However, I am currently looking into Syncthing, which I would like to
> > >>> run on the NAS directly. That would, of course, require accessing the
> > >>> filesystem directly on the NAS rather than just exporting the raw
> > >>> devices, which means circling back to this issue.
> > >>>
> > >>> After updating my NAS, I have determined that the issue still occurs
> > >>> with Linux 5.8.
> > >>>
> > >>> What's the next best step for debugging the issue? Ideally, I'd like to
> > >>> help track down the issue to find a proper fix, rather than just trying
> > >>> to bypass the issue. I wasn't sure if the suggestion to comment out
> > >>> btrfs_verify_dev_extents() was more geared toward the former or the 
> > >>> latter.
> > >>
> > >> After rewinding my memory on this case, the problem is really that the
> > >> ARM btrfs kernel is reading garbage, while X86 or ARM user space tool
> > >> works as expected.
> > >>
> > >> Can you recompile your kernel on the ARM board to add extra debugging
> > >> messages?
> > >> If possible, we can try to add some extra debug points to bombarding
> > >> your dmesg.
> > >>
> > >> Or do you have other ARM boards to test the same fs?
> > >>
> > >>
> > >> Thanks,
> > >> Qu
> > >
> > > It's pretty easy to build a kernel with custom patches applied, though
> > > the actual building takes a while, so I'd be happy to add whatever
> > > debug messages would be useful. I also have an old Raspberry Pi
> > > (original model B) I can dig out and try to get going, tomorrow. I
> > > can't hook it up to the drives directly, but I should be able to
> > > access them via NBD like I was doing from my desktop.
> >
> > RPI 1B would be a little slow but should be enough to expose the
> > problem, if the problem is for all arm builds (as long as you're also
> > using armv7 for the offending system).
> >
> > Thanks,
> > Qu
> >
> > > If I can't get
> > > that going for whatever reason, I could also try running an emulated
> > > ARM system with QEMU.
> > >
> > >>>
> > >>> On Fri, Jun 28, 2019 at 1:15 AM Qu Wenruo <quwenruo.bt...@gmx.com
> > >>> <mailto:quwenruo.bt...@gmx.com>> wrote:
> > >>>
> > >>>
> > >>>
> > >>>      On 2019/6/28 下午4:00, Erik Jensen wrote:
> > >>>       >> So it's either the block layer reading some wrong from the disk
> > >>>      or btrfs
> > >>>       >> layer doesn't do correct endian convert.
> > >>>       >
> > >>>       > My ARM board is running in little endian mode, so it doesn't 
> > >>> seem
> > >>>      like
> > >>>       > endianness should be an issue. (It is 32-bits versus my 
> > >>> desktop's 64,
> > >>>       > though.) I've also tried exporting the drives via NBD to my 
> > >>> x86_64
> > >>>       > system, and that worked fine, so if the problem is under btrfs, 
> > >>> it
> > >>>       > would have to be in the encryption layer, but fsck succeeding 
> > >>> on the
> > >>>       > ARM board would seem to rule that out, as well.
> > >>>       >
> > >>>       >> Would you dump the following data (X86 and ARM should output 
> > >>> the
> > >>>      same
> > >>>       >> content, thus one output is enough).
> > >>>       >> # btrfs ins dump-tree -b 17628726968320 /dev/dm-3
> > >>>       >> # btrfs ins dump-tree -b 17628727001088 /dev/dm-3
> > >>>       >
> > >>>       > Attached, and also 17628705964032, since that's the block
> > >>>      mentioned in
> > >>>       > my most recent mount attempt (see below).
> > >>>
> > >>>      The trees are completely fine.
> > >>>
> > >>>      So it should be something else causing the problem.
> > >>>
> > >>>       >
> > >>>       >> And then, for the ARM system, please apply the following diff,
> > >>>      and try
> > >>>       >> mount again.
> > >>>       >> The diff adds extra debug info, to exam the vital members of a
> > >>>      tree block.
> > >>>       >>
> > >>>       >> Correct fs should output something like:
> > >>>       >>   BTRFS error (device dm-4): bad tree block start, want 
> > >>> 30408704
> > >>>      have 0
> > >>>       >>   tree block gen=4 owner=5 nritems=2 level=0
> > >>>       >>   csum:
> > >>>       >>
> > >>>      
> > >>> a304e483-0000-0000-0000-00000000000000000000-0000-0000-0000-000000000000
> > >>>       >>
> > >>>       >> The csum one is the most important one, if there aren't so many
> > >>>      zeros,
> > >>>       >> it means at that timing, btrfs just got a bunch of garbage, 
> > >>> thus we
> > >>>       >> could do further debug.
> > >>>       >
> > >>>       > [  131.725573] BTRFS info (device dm-1): disk space caching is
> > >>>      enabled
> > >>>       > [  131.731884] BTRFS info (device dm-1): has skinny extents
> > >>>       > [  133.046145] BTRFS error (device dm-1): bad tree block start, 
> > >>> want
> > >>>       > 17628705964032 have 2807793151171243621
> > >>>       > [  133.055775] tree block gen=7888986126946982446
> > >>>       > owner=11331573954727661546 nritems=4191910623 level=112
> > >>>       > [  133.065661] csum:
> > >>>       >
> > >>>      
> > >>> 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> > >>>
> > >>>      Completely garbage here, so I'd say the data we got isn't what we 
> > >>> want.
> > >>>
> > >>>       > [  133.108383] BTRFS error (device dm-1): bad tree block start, 
> > >>> want
> > >>>       > 17628705964032 have 2807793151171243621
> > >>>       > [  133.117999] tree block gen=7888986126946982446
> > >>>       > owner=11331573954727661546 nritems=4191910623 level=112
> > >>>       > [  133.127756] csum:
> > >>>       >
> > >>>      
> > >>> 416a456c-1e68-dbc3-185d-aaad410beaef5493ab3f-3cb9-4ba1-2214-b41cba9656fc
> > >>>
> > >>>      But strangely, the 2nd try still gives us the same result, if it's
> > >>>      really some garbage, we should get some different result.
> > >>>
> > >>>       > [  133.136241] BTRFS error (device dm-1): failed to verify dev
> > >>>      extents
> > >>>       > against chunks: -5
> > >>>
> > >>>      You can try to skip the dev extents verification by commenting out 
> > >>> the
> > >>>      btrfs_verify_dev_extents() call in disk-io.c::open_ctree().
> > >>>
> > >>>      It may fail at another location though.
> > >>>
> > >>>      The more strange part is, we have the device tree root node read 
> > >>> out
> > >>>      without problem.
> > >>>
> > >>>      Thanks,
> > >>>      Qu
> > >>>
> > >>>       > [  133.166165] BTRFS error (device dm-1): open_ctree failed
> > >>>       >
> > >>>       > I copied some files over last time I had it mounted on my 
> > >>> desktop,
> > >>>       > which may be why it's now failing at a different block.
> > >>>       >
> > >>>       > Thanks!
> > >>>       >
> > >>>

Reply via email to