So, while working on my UEFI OS loader, I managed to brick a couple of 
machines. I recovered all but one of them so far (sadly that one is my 
personal tablet -- there may yet be a way to save it but I'm holding off on it 
until I'm better prepared for the attempt).

I want people to be able to set their machines to boot VxWorks via UEFI 
unattended, preferably in a consistent fashion. (Consistent, because there 
seems to be enough variation in setup UIs that I can't count on the user 
always being able to figure it out.)

Now, I know UEFI handles boot path selection using the BootXXXX and BootOrder 
variables, and I know that OSes can use the GetVariable()/SetVariable() 
runtime services to set up the boot order during install.

Having VxWorks call the UEFI runtime service routines is a bit of a hassle 
though (especially if you're running a 32-bit VxWorks on a 64-bit UEFI system) 
and being an embedded OS we don't go through the same installation process as 
a self-hosted OS like Windows anyway, so instead I fixed it so that you can 
configure the boot path in the OS loader. After the OS loader starts, you can 
hit a key to pause the OS launch and then enter a small boot manager menu 
where you can add or remove boot path entries.

Note that I am testing with UEFI 2.3.x systems, but they're not particularly 
recent. One example system is from Emerson Network Power and it has an AMI 
BIOS from 2009. Actually, so far all three of the machines I managed to 
clobber have AMI BIOSes, but are from different hardware vendors. (One is from 
Dell.)

The tests I do involve setting the system to boot from a USB thumb drive. With 
the first case that I encountered was with a Dell laptop. I was correctly 
adding a new BootXXXX variable and updating the BootOrder, and booting from 
the thumb drive worked. As near as I can figure, the problem was that the 
routine to undo this change was faulty: it would delete the recently created 
BootXXXX variable but fail to update the BootOrder. This left the laptop stuck 
trying to access a BootXXXX variable that didn't exist. (For example, the 
BootOrder could be saying 0001000200030004, yet there is no Boot0001 variable 
present.) Hitting F2 caused it to say "Entering setup..." but it would never 
get there. Similarly, hitting F12 would cause it to say "Preparing one-time 
boot options..." (or something similar) but it would never reach that point 
either.

I finally managed to un-wedge it by disconnecting the hard disk (which luckily 
was easy to do on this model laptop as it was designed to be upgraded). After 
I did that and rebooted the machine a couple of times, it finally seemed to go 
back to normal. I suspect that unplugging the disk caused the firmware to re-
probe the available boot devices and finally it updated the BootOrder such 
that it threw out the bogus entry.

I found the bug in my code that was preventing the boot path removal from 
resetting BootOrder correctly and fixed it, and that took care of the issue. I 
was a bit annoyed that I had almost irretrievably bricked the machine 
(especially since it was on loan from IT) but once I had it back I forgot 
about it.

Then recently I managed to introduce another bug which caused me to brick my 
Emerson board and my tablet. The failure here was a little different. From 
what I can tell in this case, I was adding an incorrectly formatted BootXXXX 
variable. The "Firmware Boot Manager" section in chapter 11 of the Beyond BIOS 
book descripes the EFI_LOAD_OPTION structure. You're supposed to have a 
Description field followed by FilePathList array. The FilePathList is supposed 
to immediately follow the null-terminated unicode description string. In my 
case, I was calculating the offset for the FilePathList incorrectly and 
placing it father away.

The result was that the machine would start up, display the Emerson banner and 
the "Press F2 or F7" prompt, and then get stuck. As with the Dell, trying to 
hit F2 to enter setup or F7 to access the BBS screen would fail (it would say 
"Entering Setup..." after hitting F2, but would never make it.)

I enabled the BIOS recovery jumper and tried to reload the BIOS image via the 
serial port -- it allowed me to do that, but this apparently didn't reset the 
NVRAM contents because the system was _still_ wedged.

Finally I remembered that, this being a development board, it has an XDP 
connector on it and I have a Wind River ICE2 which is compatible with it. 
Using the ICE I was able to take control of the CPU and found it was stuck in 
a loop. I forced it to break out of the loop... and then found it got caught 
in another outer loop. I forced it out of that one too, and it got farther 
along and launched the PXE option ROM. It got stuck before booting correctly, 
but when I reset it this time I found it had updated the NVRAM contents to 
remove the BootOrder entry for the bad boot path and it was no longer bricked.

Now, I don't know if this is a general problem with the EDKII code or not. I 
tried to trigger the same condition with OVMF, but I may be hamstrung by the 
fact that the QEMU/OVMF combo doesn't actually emulate a persistent flash 
device. I'm also simulating booting from a hard disk drive rather than a USB 
thumb drive. (I'm not sure how to do the latter.) OVMF seems to retain the 
bogus boot path in the data store -- I can see the BootXXXX variable with 
dmpstore from the shell and I can see that BootOrder is still set to consider 
it first -- but it seems to be smart enough to ignore it as I always wind up 
back at the shell prompt rather than entering my boot loader.

There are several possibilities here:

1) The problem is specific to a particular vendor implementation of UEFI
2) The problem is not specific to any vendor but has been fixed in the EDKII 
code already and I'm just rehashing ancient history
3) The problem only occurs with certain boot devices (USB?)
4) The problem only occurs with certain specific invalid data in the BootXXXX 
variable
5) I really do need to simulate having a persistent NVRAM in order to pass 
through the right code path to trigger the failure

One thing I would like to know is whether it's the case that, in general, UEFI 
evaluates all the boot paths before allowing the user to enter the setup 
screen. I found it very frustrating that the machine would taunt me with the 
catch-22 of getting stuck before letting me into the setup menu with which I 
could cure the problem that was getting it stuck.

-Bill

-- 
=============================================================================
-Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
                 [email protected] | Master of Unix-Fu - Wind River Systems
=============================================================================
   "I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to