So, while working on my UEFI OS loader, I managed to brick a couple of
machines. I recovered all but one of them so far (sadly that one is my
personal tablet -- there may yet be a way to save it but I'm holding off on it
until I'm better prepared for the attempt).
I want people to be able to set their machines to boot VxWorks via UEFI
unattended, preferably in a consistent fashion. (Consistent, because there
seems to be enough variation in setup UIs that I can't count on the user
always being able to figure it out.)
Now, I know UEFI handles boot path selection using the BootXXXX and BootOrder
variables, and I know that OSes can use the GetVariable()/SetVariable()
runtime services to set up the boot order during install.
Having VxWorks call the UEFI runtime service routines is a bit of a hassle
though (especially if you're running a 32-bit VxWorks on a 64-bit UEFI system)
and being an embedded OS we don't go through the same installation process as
a self-hosted OS like Windows anyway, so instead I fixed it so that you can
configure the boot path in the OS loader. After the OS loader starts, you can
hit a key to pause the OS launch and then enter a small boot manager menu
where you can add or remove boot path entries.
Note that I am testing with UEFI 2.3.x systems, but they're not particularly
recent. One example system is from Emerson Network Power and it has an AMI
BIOS from 2009. Actually, so far all three of the machines I managed to
clobber have AMI BIOSes, but are from different hardware vendors. (One is from
Dell.)
The tests I do involve setting the system to boot from a USB thumb drive. With
the first case that I encountered was with a Dell laptop. I was correctly
adding a new BootXXXX variable and updating the BootOrder, and booting from
the thumb drive worked. As near as I can figure, the problem was that the
routine to undo this change was faulty: it would delete the recently created
BootXXXX variable but fail to update the BootOrder. This left the laptop stuck
trying to access a BootXXXX variable that didn't exist. (For example, the
BootOrder could be saying 0001000200030004, yet there is no Boot0001 variable
present.) Hitting F2 caused it to say "Entering setup..." but it would never
get there. Similarly, hitting F12 would cause it to say "Preparing one-time
boot options..." (or something similar) but it would never reach that point
either.
I finally managed to un-wedge it by disconnecting the hard disk (which luckily
was easy to do on this model laptop as it was designed to be upgraded). After
I did that and rebooted the machine a couple of times, it finally seemed to go
back to normal. I suspect that unplugging the disk caused the firmware to re-
probe the available boot devices and finally it updated the BootOrder such
that it threw out the bogus entry.
I found the bug in my code that was preventing the boot path removal from
resetting BootOrder correctly and fixed it, and that took care of the issue. I
was a bit annoyed that I had almost irretrievably bricked the machine
(especially since it was on loan from IT) but once I had it back I forgot
about it.
Then recently I managed to introduce another bug which caused me to brick my
Emerson board and my tablet. The failure here was a little different. From
what I can tell in this case, I was adding an incorrectly formatted BootXXXX
variable. The "Firmware Boot Manager" section in chapter 11 of the Beyond BIOS
book descripes the EFI_LOAD_OPTION structure. You're supposed to have a
Description field followed by FilePathList array. The FilePathList is supposed
to immediately follow the null-terminated unicode description string. In my
case, I was calculating the offset for the FilePathList incorrectly and
placing it father away.
The result was that the machine would start up, display the Emerson banner and
the "Press F2 or F7" prompt, and then get stuck. As with the Dell, trying to
hit F2 to enter setup or F7 to access the BBS screen would fail (it would say
"Entering Setup..." after hitting F2, but would never make it.)
I enabled the BIOS recovery jumper and tried to reload the BIOS image via the
serial port -- it allowed me to do that, but this apparently didn't reset the
NVRAM contents because the system was _still_ wedged.
Finally I remembered that, this being a development board, it has an XDP
connector on it and I have a Wind River ICE2 which is compatible with it.
Using the ICE I was able to take control of the CPU and found it was stuck in
a loop. I forced it to break out of the loop... and then found it got caught
in another outer loop. I forced it out of that one too, and it got farther
along and launched the PXE option ROM. It got stuck before booting correctly,
but when I reset it this time I found it had updated the NVRAM contents to
remove the BootOrder entry for the bad boot path and it was no longer bricked.
Now, I don't know if this is a general problem with the EDKII code or not. I
tried to trigger the same condition with OVMF, but I may be hamstrung by the
fact that the QEMU/OVMF combo doesn't actually emulate a persistent flash
device. I'm also simulating booting from a hard disk drive rather than a USB
thumb drive. (I'm not sure how to do the latter.) OVMF seems to retain the
bogus boot path in the data store -- I can see the BootXXXX variable with
dmpstore from the shell and I can see that BootOrder is still set to consider
it first -- but it seems to be smart enough to ignore it as I always wind up
back at the shell prompt rather than entering my boot loader.
There are several possibilities here:
1) The problem is specific to a particular vendor implementation of UEFI
2) The problem is not specific to any vendor but has been fixed in the EDKII
code already and I'm just rehashing ancient history
3) The problem only occurs with certain boot devices (USB?)
4) The problem only occurs with certain specific invalid data in the BootXXXX
variable
5) I really do need to simulate having a persistent NVRAM in order to pass
through the right code path to trigger the failure
One thing I would like to know is whether it's the case that, in general, UEFI
evaluates all the boot paths before allowing the user to enter the setup
screen. I found it very frustrating that the machine would taunt me with the
catch-22 of getting stuck before letting me into the setup menu with which I
could cure the problem that was getting it stuck.
-Bill
--
=============================================================================
-Bill Paul (510) 749-2329 | Senior Member of Technical Staff,
[email protected] | Master of Unix-Fu - Wind River Systems
=============================================================================
"I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================
------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience. Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/edk2-devel