On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > On 2021/2/6 上午9:57, Erik Jensen wrote: > > On Wed, Feb 3, 2021 at 10:16 PM Erik Jensen <erikjen...@rkjnsn.net> wrote: > >> On Sun, Jan 31, 2021 at 9:50 PM Su Yue <l...@damenly.su> wrote: > >>> On Mon 01 Feb 2021 at 10:35, Qu Wenruo <quwenruo.bt...@gmx.com> > >>> wrote: > >>>> On 2021/1/29 下午2:39, Erik Jensen wrote: > >>>>> On Mon, Jan 25, 2021 at 8:54 PM Erik Jensen > >>>>> <erikjen...@rkjnsn.net> wrote: > >>>>>> On Wed, Jan 20, 2021 at 1:08 AM Erik Jensen > >>>>>> <erikjen...@rkjnsn.net> wrote: > >>>>>>> On Wed, Jan 20, 2021 at 12:31 AM Qu Wenruo > >>>>>>> <quwenruo.bt...@gmx.com> wrote: > >>>>>>>> On 2021/1/20 下午4:21, Qu Wenruo wrote: > >>>>>>>>> On 2021/1/19 下午5:28, Erik Jensen wrote: > >>>>>>>>>> On Mon, Jan 18, 2021 at 9:22 PM Erik Jensen > >>>>>>>>>> <erikjen...@rkjnsn.net> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Jan 18, 2021 at 4:12 AM Erik Jensen > >>>>>>>>>>> <erikjen...@rkjnsn.net> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> The offending system is indeed ARMv7 (specifically a > >>>>>>>>>>>> Marvell ARMADA® > >>>>>>>>>>>> 388), but I believe the Broadcom BCM2835 in my Raspberry > >>>>>>>>>>>> Pi is > >>>>>>>>>>>> actually ARMv6 (with hardware float support). > >>>>>>>>>>> > >>>>>>>>>>> Using NBD, I have verified that I receive the same error > >>>>>>>>>>> when > >>>>>>>>>>> attempting to mount the filesystem on my ARMv6 Raspberry > >>>>>>>>>>> Pi: > >>>>>>>>>>> [ 3491.339572] BTRFS info (device dm-4): disk space > >>>>>>>>>>> caching is enabled > >>>>>>>>>>> [ 3491.394584] BTRFS info (device dm-4): has skinny > >>>>>>>>>>> extents > >>>>>>>>>>> [ 3492.385095] BTRFS error (device dm-4): bad tree block > >>>>>>>>>>> start, want > >>>>>>>>>>> 26207780683776 have 3395945502747707095 > >>>>>>>>>>> [ 3492.514071] BTRFS error (device dm-4): bad tree block > >>>>>>>>>>> start, want > >>>>>>>>>>> 26207780683776 have 3395945502747707095 > >>>>>>>>>>> [ 3492.553599] BTRFS warning (device dm-4): failed to > >>>>>>>>>>> read tree root > >>>>>>>>>>> [ 3492.865368] BTRFS error (device dm-4): open_ctree > >>>>>>>>>>> failed > >>>>>>>>>>> > >>>>>>>>>>> The Raspberry Pi is running Linux 5.4.83. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Okay, after some more testing, ARM seems to be irrelevant, > >>>>>>>>>> and 32-bit > >>>>>>>>>> is the key factor. On a whim, I booted up an i686, 5.8.14 > >>>>>>>>>> kernel in a > >>>>>>>>>> VM, attached the drives via NBD, ran cryptsetup, tried to > >>>>>>>>>> mount, and… > >>>>>>>>>> I got the exact same error message. > >>>>>>>>>> > >>>>>>>>> My educated guess is on 32bit platforms, we passed > >>>>>>>>> incorrect sector into > >>>>>>>>> bio, thus gave us garbage. > >>>>>>>> > >>>>>>>> To prove that, you can use bcc tool to verify it. > >>>>>>>> biosnoop can do that: > >>>>>>>> https://github.com/iovisor/bcc/blob/master/tools/biosnoop_example.txt > >>>>>>>> > >>>>>>>> Just try mount the fs with biosnoop running. > >>>>>>>> With "btrfs ins dump-tree -t chunk <dev>", we can manually > >>>>>>>> calculate the > >>>>>>>> offset of each read to see if they matches. > >>>>>>>> If not match, it would prove my assumption and give us a > >>>>>>>> pretty good > >>>>>>>> clue to fix. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Qu > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Is this bug happening only on the fs, or any other btrfs > >>>>>>>>> can also > >>>>>>>>> trigger similar problems on 32bit platforms? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Qu > >>>>>>> > >>>>>>> I have only observed this error on this file system. > >>>>>>> Additionally, the > >>>>>>> error mounting with the NAS only started after I did a `btrfs > >>>>>>> replace` > >>>>>>> on all five 8TB drives using an x86_64 system. (Ironically, I > >>>>>>> did this > >>>>>>> with the goal of making it faster to use the filesystem on > >>>>>>> the NAS by > >>>>>>> re-encrypting the drives to use a cipher supported by my > >>>>>>> NAS's crypto > >>>>>>> accelerator.) > >>>>>>> > >>>>>>> Maybe this process of shuffling 40TB around caused some value > >>>>>>> in the > >>>>>>> filesystem to increment to the point that a calculation using > >>>>>>> it > >>>>>>> overflows on 32-bit systems? > >>>>>>> > >>>>>>> I should be able to try biosnoop later this week, and I'll > >>>>>>> report back > >>>>>>> with the results. > >>>>>> > >>>>>> Okay, I tried running biosnoop, but I seem to be running into > >>>>>> this > >>>>>> bug: https://github.com/iovisor/bcc/issues/3241 (That bug was > >>>>>> reported > >>>>>> for cpudist, but I'm seeing the same error when I try to run > >>>>>> biosnoop.) > >>>>>> > >>>>>> Anything else I can try? > >>>>> > >>>>> Is it possible to add printks to retrieve the same data? > >>>>> > >>>> Sorry for the late reply, busying testing subpage patchset. (And > >>>> unfortunately no much process). > >>>> > >>>> If bcc is not possible, you can still use ftrace events, but > >>>> unfortunately I didn't find good enough one. (In fact, the trace > >>>> events > >>>> for block layer is pretty limited). > >>>> > >>>> You can try to add printk()s in function blk_account_io_done() > >>>> to > >>>> emulate what's done in function trace_req_completion() of > >>>> biosnoop. > >>>> > >>>> The time delta is not important, we only need the device name, > >>>> sector > >>>> and length. > >>>> > >>> > >>> Tips: There are ftrace events called block:block_rq_issue and > >>> block:block_rq_complete to fetch those infomation. No need to > >>> add printk(). > >>> > >>>> > >>>> Thanks, > >>>> Qu > >>> > >> > >> Okay, here's the output of the trace: > >> https://gist.github.com/rkjnsn/4cf606874962b5a0284249b2f2e934f5 > >> > >> And here's the output dump-tree: > >> https://gist.github.com/rkjnsn/630b558eaf90369478d670a1cb54b40f > >> > >> One important note is that ftrace only captured requests at the > >> underlying block device (nbd, in this case), not at the device mapper > >> level. The encryption header on these drives is 16 MiB, so the offset > >> reported in the trace will be 16777216 bytes larger than the offset > >> brtfs was actually trying to read at the time. > >> > >> In case it's helpful, I believe this is the mapping of which > >> (encrypted) nbd device node in the trace corresponds to which > >> (decrypted) filesystem device: > >> 43,0 33c75e20-26f2-4328-a565-5ef3484832aa > >> 43,32 9bdfdb8f-abfb-47c5-90af-d360d754a958 > >> 43,64 39a9463d-65f5-499b-bca8-dae6b52eb729 > >> 43,96 f1174dea-ea10-42f2-96b4-4589a2980684 > >> 43,128 e669d804-6ea2-4516-8536-1d266f88ebad > > > > What are the chances it's something simple like a long getting used > > somewhere in the code that should actually be a 64-bit int? > > > That's what I expected, but I didn't find anything obviously suspicious yet. > > Unfortunately I didn't get much useful info from the trace events. > As a lot of the values doesn't even make sense to me.... > > But the chunk tree dump proves to be more useful. > > Firstly, the offending tree block doesn't even occur in chunk chunk ranges. > > The offending tree block is 26207780683776, but the tree dump doesn't > have any range there. > > The highest chunk is at 5958289850368 + 4294967296, still one digit > lower than the expected value. > > I'm surprised we didn't even get any error for that, thus it may > indicate our chunk mapping is incorrect too. > > Would you please try the following diff on the 32bit system and report > back the dmesg? > > The diff adds the following debug output: > - when we try to read one tree block > - when a bio is mapped to read device > - when a new chunk is added to chunk tree > > Thanks, > Qu
Okay, here's the dmesg output from attempting to mount the filesystem: https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20 I captured this on my 32-bit x86 VM, as it's much faster to rebuild the kernel there than on my ARM board, and it fails with the same error.