在 2024/4/22 16:07, Sachi King 写道:
Hi,

I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain
it's a bug and not a corruption issue.

A bit of history on the filesystem.  It is a fairly new filesystem as it was
being used to give me access to test a wayland application on a
Raspberry Pi.  The filesystem was about 3 days old when I hit the bug, and
I'm fairly certain it never had an unclean shutdown.  I have checked the
filesystem with "btrfs check" which has found no errors.  The filesystem
mounts on Linux and is functional.

# btrfs check --check-data-csum /dev/sda2
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking csums against data
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 5070573568 bytes used, no error found
total csum bytes: 4451620
total tree bytes: 370458624
total fs tree bytes: 353124352
total extent tree bytes: 10010624
btree space waste bytes: 62303284
file data blocks allocated: 6786519040
  referenced 6328619008

Since btrfs check reports no error, the fs must be valid.

But considering how new it is, it may be related to some new features
not properly implemented in Uboot.

Is it possible to provide the whole binary dump of the fs?



I've made an image of the filesystem so I could reproduce the bug in an
environment that doesn't require the physical SBC, and have reproduced
the issue using the head of the master branch with "qemu-x86_64_defconfig".

My testing qemu was produced with the following:
# make qemu-x86_64_defconfig
# cat << EOF >> .config
CONFIG_AUTOBOOT=y
CONFIG_BOOTDELAY=1
CONFIG_USE_BOOTCOMMAND=y
CONFIG_BOOTSTD_DEFAULTS=y
CONFIG_BOOTSTD_FULL=y
CONFIG_CMD_BOOTFLOW_FULL=y
CONFIG_BOOTCOMMAND="bootflow scan -lb"
CONFIG_ENV_IS_NOWHERE=y
CONFIG_LZ4=y
CONFIG_BZIP2=y
CONFIG_ZSTD=y
CONFIG_FS_BTRFS=y
CONFIG_CMD_BTRFS=y
CONFIG_GZIP=y
CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi"
EOF
# make -j24

bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains:
/ {
         bootstd {
                 compatible = "u-boot,boot-std";
                 filename-prefixes = "/@boot/", "/boot/", "/";
                 bootdev-order = "scsi";
                 extlinux {
                         compatible = "u-boot,extlinux";
                 };
         };
};


The VM was run with
qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action reboot=shutdown 
-drive file=/mnt/dbg/u-boot-debug.img

The error message I recive on boot is
BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()!
BUG!
resetting ...


I added a print statement to free_extent_buffer_internal that prints the
start address of the extent_buffer as I'm not sure what to be looking for
here.  This print statement is before the decrement.
printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", eb->start, 
eb->refs);

The last message before the crash reported eb->start to be "0", with 0 refs.
free_extent_buffer_internal: eb->start[0] eb->refs[0]

The extent at 0 struck me as odd, so I tried commenting out the freeing, by
removing the call to free_extent_buffer_final, and this resulted in bootflow
succeeding and showing me the boot menu, which suprised me.
I expected to see the bug reproduce itself, with refs being zero, but eb->start
pointing somewhere valid, but I instead got a valid address with refs at 2.

I'm assuming that the order free_extent_buffer_internal is called is
deterministic, so by counting the print outputs the line that prior held
the extent_buffer with a 0 start was replaced with:
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]

Interestingly, as can be seen in the full logs with my included print
messages, 249c000 is being used just before this, with a ref count of
2.  249c000 does appear to reach a point where it should have been freed
in the past, before it gets used again as seen in both logs.

The failing boot log:
U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)
Trying to boot from SPI
Jumping to 64-bit U-Boot: Note many features are missing


U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000)

CPU:   QEMU Virtual CPU version 2.5+
DRAM:  128 MiB
Core:  13 devices, 13 uclasses, devicetree: separate
Loading Environment from nowhere... OK
Model: QEMU x86 (I440FX)
Net:   e1000: 00:00:00:00:00:00

Error: e1000#0 No valid MAC address found.
       eth_initialize() No ethernet found.


Hit any key to stop autoboot:  0
Scanning for bootflows in all bootdevs
Seq  Method       State   Uclass    Part  Name                      Filename
---  -----------  ------  --------  ----  ------------------------  
----------------
scanning bus for devices...
Target spinup took 0 ms.
SATA link 1 timeout.
Target spinup took 0 ms.
SATA link 3 timeout.
SATA link 4 timeout.
SATA link 5 timeout.
AHCI 0001.0000 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
flags: 64bit ncq only
free_extent_buffer_internal: eb->start[10000] eb->refs[1]
free_extent_buffer_internal: eb->start[1548000] eb->refs[1]
free_extent_buffer_internal: eb->start[150c000] eb->refs[1]
free_extent_buffer_internal: eb->start[154c000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[1]
free_extent_buffer_internal: eb->start[6de0000] eb->refs[1]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[1]
   Device 0: (0:0) Vendor: ATA Prod.: QEMU HARDDISK Rev: 2.5+
             Type: Hard Disk
             Capacity: 58680.0 MB = 57.3 GB (120176640 x 512)
timeout exit!
Scanning bootdev 'ahci_scsi.id0lun0.bootdev':
free_extent_buffer_internal: eb->start[10000] eb->refs[1]
free_extent_buffer_internal: eb->start[1548000] eb->refs[1]
free_extent_buffer_internal: eb->start[150c000] eb->refs[1]
free_extent_buffer_internal: eb->start[154c000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[1]
free_extent_buffer_internal: eb->start[6de0000] eb->refs[1]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[1]
free_extent_buffer_internal: eb->start[10000] eb->refs[1]
free_extent_buffer_internal: eb->start[1548000] eb->refs[1]
free_extent_buffer_internal: eb->start[150c000] eb->refs[1]
free_extent_buffer_internal: eb->start[154c000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[0] eb->refs[0]
BUG at fs/btrfs/extent-io.c:626/free_extent_buffer_internal()!

The eb[0] seems very weird.
Even for superblock reading, we got eb at 64K, and every eb should have
at least one ref at creation time.

So this indeed looks like the direct cause.
But without a full call trace (is it possible inside U-boot runtime?), I
do not have an immediate clue.

Thanks,
Qu

BUG!
resetting ...

The succeeding log with the removed freeing:
U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:38:25 +1000)
Trying to boot from SPI
Jumping to 64-bit U-Boot: Note many features are missing


U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:38:25 +1000)

CPU:   QEMU Virtual CPU version 2.5+
DRAM:  128 MiB
Core:  13 devices, 13 uclasses, devicetree: separate
Loading Environment from nowhere... OK
Model: QEMU x86 (I440FX)
Net:   e1000: 00:00:00:00:00:00

Error: e1000#0 No valid MAC address found.
       eth_initialize() No ethernet found.


Hit any key to stop autoboot:  0
Scanning for bootflows in all bootdevs
Seq  Method       State   Uclass    Part  Name                      Filename
---  -----------  ------  --------  ----  ------------------------  
----------------
scanning bus for devices...
Target spinup took 0 ms.
SATA link 1 timeout.
Target spinup took 0 ms.
SATA link 3 timeout.
SATA link 4 timeout.
SATA link 5 timeout.
AHCI 0001.0000 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
flags: 64bit ncq only
free_extent_buffer_internal: eb->start[10000] eb->refs[1]
free_extent_buffer_internal: eb->start[1548000] eb->refs[1]
free_extent_buffer_internal: eb->start[150c000] eb->refs[1]
free_extent_buffer_internal: eb->start[154c000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[1]
free_extent_buffer_internal: eb->start[6de0000] eb->refs[1]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[1]
   Device 0: (0:0) Vendor: ATA Prod.: QEMU HARDDISK Rev: 2.5+
             Type: Hard Disk
             Capacity: 58680.0 MB = 57.3 GB (120176640 x 512)
timeout exit!
Scanning bootdev 'ahci_scsi.id0lun0.bootdev':
free_extent_buffer_internal: eb->start[10000] eb->refs[1]
free_extent_buffer_internal: eb->start[1548000] eb->refs[1]
free_extent_buffer_internal: eb->start[150c000] eb->refs[1]
free_extent_buffer_internal: eb->start[154c000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[1]
free_extent_buffer_internal: eb->start[6de0000] eb->refs[1]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[1]
free_extent_buffer_internal: eb->start[10000] eb->refs[1]
free_extent_buffer_internal: eb->start[1548000] eb->refs[1]
free_extent_buffer_internal: eb->start[150c000] eb->refs[1]
free_extent_buffer_internal: eb->start[154c000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[2]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a0000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[24a4000] eb->refs[1]
free_extent_buffer_internal: eb->start[249c000] eb->refs[2]
free_extent_buffer_internal: eb->start[2490000] eb->refs[1]
free_extent_buffer_internal: eb->start[6de0000] eb->refs[1]
free_extent_buffer_internal: eb->start[28f4000] eb->refs[1]
free_extent_buffer_internal: eb->start[1544000] eb->refs[1]
   0  extlinux     ready   scsi         2  ahci_scsi.id0lun0.bootdev 
/@boot/extlinux/extlinux.conf
** Booting bootflow 'ahci_scsi.id0lun0.bootdev.part_2' with extlinux
------------------------------------------------------------
1:      NixOS - Default
2:      NixOS - Configuration 5 (2024-03-04 11:00 - 24.05.20240205.faf912b)
3:      NixOS - Configuration 4 (2024-03-02 15:05 - 24.05.20240205.faf912b)
4:      NixOS - Configuration 3 (2024-03-02 14:26 - 24.05.20240205.faf912b)
5:      NixOS - Configuration 2 (2024-03-02 14:06 - 24.05.20240205.faf912b)
6:      NixOS - Configuration 1 (1970-01-01 10:00 - 24.05.20240205.faf912b)
Enter choice: 1:        NixOS - Default
Retrieving file: 
/@boot/extlinux/../nixos/gq8jsgxnhq2wvsjrni3cjw1wb5540wjw-linux-6.1.63-stable_20231123-Image
free_extent_buffer_internal: eb->start[10000] eb->refs[1]


I'm not sure where to go from here with the bug, so I'm hoping for some
help in tracking it down so it can be fixed.

Thanks,
Sachi



Reply via email to