Package: cloud.debian.org
Severity: grave
X-Debbugs-Cc: [email protected], [email protected]
User: [email protected]
Usertag: image

(The severity may not be set right, as I do not have the ability to test
against any Raspberry Pi machines other than the RPi 4B, however this
very likely affects all RPi 4B machines.)

The Raspberry Pi Trixie cloud images currently use a hybrid MBR/GPT
partition layout, such that the GPT partition table has two partitions
(one for the FAT32 firmware partition, one for the root filesystem), and
the MBR partition table has three (one for the FAT32 firmware partition,
exactly overlapping the corresponding partition in the GPT table, one
protective partition spanning from sector 1 to the beginning of the
firmware partition, and one protective partition spanning from the
sector after the firmware partition to the end of the cloud image).

It appears that the intent is to allow the Raspberry Pi's firmware to
boot from the FAT32 partition listed in the MBR. Linux is then supposed
to notice the GPT table and use it. Unfortunately, this does not work
with the version of raspi-firmware in Trixie. The firmware files in
Trixie (start*.elf, fixup*.dat) are rather particular about what kind of
partition table they will boot from:

* Neither the first partition nor the second partition may be of type
  0xEE (the GPT protective MBR partition type). If either partition is
  of this type, the system will hang on the "rainbow screen" and
  repeatedly blink the SD card access LED seven times, with two second
  pauses in between each batch of blinks. (According to Raspberry Pi
  documentation, this is an error code indicating that the kernel could
  not be found.) This behavior isn't documented anywhere I know of, I
  discovered it by trial and error.
* The first partition slot must not be blank. If it is, the Pi won't
  even get to the rainbow screen; the disk access LED will flash very
  briefly and then stop.

Because the second partition in the Trixie cloud image's MBR table is a
GPT protective MBR partition, the image is unbootable.

There are a few ways to fix this that I've found so far:

* Upgrade the firmware on the image. The earliest working tag from
  https://github.com/raspberrypi/firmware that I found was 1.20250305,
  which was able to boot the Trixie live image with its existing
  partition layout. The other two tags I tested (1.20241126 and
  1.20240424, the version present on the Trixie image by default) failed
  with a "kernel not found" error code as described above.
* Modify the MBR table such that the first partition slot contains a
  FAT32 partition for the firmware partition, the second partition slot
  is blank (all zeros), and the third partition slot is a GPT protective
  MBR partition spanning from sector 1 to the sector before the firmware
  partition. For some reason this is enough to get the Raspberry Pi
  firmware to find the kernel and boot it properly. The protective
  partition on the third slot makes Linux recognize the disk as being
  GPT-formatted, so it finds the root filesystem successfully.
* Switch to an MBR-only partition scheme.

Of the three, the second seems the most promising but also the most
dangerous (as a behavior change in either the kernel or the firmware
could break this in future release upgrades). The first one probably
isn't an option since there is no way to cherry-pick the fix for the
boot issue, given that the firmware is closed-source. The last one is
likely safest.

Attachment: pgphHmSkwvpvs.pgp
Description: OpenPGP digital signature

Reply via email to