Subject says it all really, is this an option at this time?

we'd like to try boot the main zfs root partition and then fall back to a small UFS based recovery partition.. is that possible?

I know we could use grub but I'd prefer keep it in the family.

it is, sure. but there is an compromise to be made for it.

Lets start with what I have done in illumos port, as the idea there is exactly about having as “universal” binaries as possible (just the binaries are listed below to get the size):

-r-xr-xr-x   1 root     sys       171008 apr 30 19:55 bootia32.efi
-r-xr-xr-x   1 root     sys       148992 apr 30 19:55 bootx64.efi
-r--r--r--   1 root     sys         1255 okt 25  2015 cdboot
-r--r--r--   1 root     sys       154112 apr 30 19:55 gptzfsboot
-r-xr-xr-x   1 root     sys       482293 mai  2 21:10 loader32.efi
-r-xr-xr-x   1 root     sys       499218 mai  2 21:10 loader64.efi
-r--r--r--   1 root     sys          512 okt 15  2015 pmbr
-r--r--r--   1 root     sys       377344 mai  2 21:10 pxeboot
-r--r--r--   1 root     sys       376832 mai  2 21:10 zfsloader

the loader (bios/efi) is built with full complement - zfs, ufs, dosfs, cd9660, nfs, tftp + gzipfs. The cdboot is starting zfsloader (thats trivial string change).

The gptzfsboot in illumos case is only built with zfs, dosfs and ufs - as it has to support only disk based media to read out the loader. Also I am building gptzfsboot with libstand and libi386 to get as much shared code as possible - which has both good and bad sides, as usual;)

The gptzfsboot size means that with ufs the dedicated boot partition is needed (freebsd-boot), with zfs the illumos port is always using the 3.5MB boot area after first 2 labels (as there is no geli, the illumos does not need dedicated boot partition with zfs).

As the freebsd-boot is currently created 512k, the size is not an issue. Also using common code does allow the generic partition code to be used, so GPT/MBR/BSD (VTOC in illumos case) labels are not problem.

So, even just with cd boot (iso), starting zfsloader (which in fbsd has built in ufs, zfs etc), you already can get rescue capability.

Now, even with just adding ufs reader to gptzfsboot, we can use gpt + freebsd-boot and ufs root but loading zfsloader on usb image, so it can be used for both live/install and rescue, because zfsloader itself has support for all file systems + partition types.

I have kept myself a bit off from freebsd gptzfsboot because of simple reason - the older setups have smaller size for freebsd boot, and not everyone is necessarily happy about size changes:D also in freebsd case there is another factor called geli - it most certainly does contribute some bits, but also needs to be properly addressed on IO call stack (as we have seen with zfsbootcfg bits). But then again, here also the shared code can help to reduce the complexity.

Yea, the zfsloader/loader*.efi in that listing above is actually built with framebuffer code and compiled in 8x16 default font (lz4 compressed ascii+boxdrawing basically - because zfs has lz4, the decompressor is always there), and ficl 4.1, so thats a bit of difference from fbsd loader.

Also note that we can still build the smaller dedicated blocks like boot2, just that we can not use those blocks for more universal cases and eventually those special cases will diminish.

thanks for that..

 so, here's my exact problem I need to solve.
FreeBSD 10 (or newer) on Amazon EC2.
We need to have a plan for recovering the scenario where somethign goes wrong (e.g. during an upgrade) and we are left with a system where the default zpool rootfs points to a dataset that doesn't boot. It is possible that mabe the entire pool is unbootable into multi-user.. Maybe somehow it filled up? who knows. It's hard to predict future problems. There is no console access at all so there is no possibility of human intervention. So all recovery paths that start "enter single user mode and...." are unusable.

The customers who own the amazon account are not crazy about giving us the keys to the kingdom as far as all their EC2 instances, so taking a root drive off a 'sick' VM and grafting it onto a freebsd instance to 'repair' it becomes a task we don't want to really have to ask them to do. They may not have the in-house expertise to do it. confidently.

This leaves us with automatic recovery, or at least automatic methods of getting access to that drive from the network. Since the regular root is zfs, my gut feeling is that to deduce the chances of confusion during recovery, I'd like the (recovery) system itself to be running off a UFS partition, and potentially, with a memory root filesystem. As long as it can be reached over the network we can then take over.

we'd also like to have the boot environment support in the bootcode.
so, what would be the minimum set we'd need?

Ufs support, zfs support, BE support, and support for selecting a completely different boot procedure after some number of boot attempts without getting all the way to multi-user.

How does that come out size-wise? And what do I need to configure to get that?

The current EC2 Instances have a 64kB boot partition , but I have a window to convince management to expand that if I have a good enough argument. (since we a re doing a repartition on the next upgrade, which is "special" (it's out upgrade to 10.3 from 8.0). Being able to self heal or at least 'get at' a sick instance might be a good enough argument and would make the EC2 instances the same as all the other versions of the product..

/me has thought.. I wonder if the ec2 instance bios has enough network support to allow PXE-like behaviour? or at least being able to receive packets..?


