The mysterious call to the floppy probing in stage 1 is still an unresolved
issue to me. As Okuji-san advised me, I first read through the archived mail
list. But after having read a year's amount of material, I finally got sick
and tired of it. To shed some light on the situation, I went through the
revisions of stage 1 in the CVS instead.
Here is my report, for those of you interested in software development,
especially that of stage 1. I focussed on the topics of my 5 patches sent in
recently, and that hard to understand program logic around the probing BIOS call
in the CHS code path of stage 1.
A reader should have some familarity with the contents of stage 1 to see the
details. To me, it was instructive and enlightening to see, how common
shortcomings in software design and documentation finally evolve into
bugs/errors/misconceptions.
Have fun
Wolf Lammen
--
+++ NEU bei GMX und erstmalig in Deutschland: T�V-gepr�fter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
A stroll through the history of GRUB stage 1
It all started in 1996 when Erich Boleyn posted version 1.1 of GRUB stage 1. This tiny
program has to find its successor stage 2 on a disk, be it a floppy or a hard disk,
load the first 512 bytes into RAM and transfer control to it.
Stage 1 is the very first program started, after a computer has booted. Therefore it
has no helper library available other than the BIOS. Furthermore, it is confined into
about 400 bytes of code, which does not allow for any luxury. So it is coded in
assembler, using low-level techniques. To make things even more difficult, the BIOS
does not prove to be a very reliable companion.
This all is quite difficult to maintain and understand for human beings. I'd like to
say this first, before I point to errors of others, errors that might have occurred to
me, too, if I had been in the process of development, perhaps suffering from time or
budget pressure. To me, those errors are understandable.
The following gives a short introduction into the terminology of disk and boot
technology. Readers familar with stage 1 may skip this section safely and continue
after the next dashed line (---).
Stage 1 fulfills his task in roughly 3 steps: Firstly, it initializes itself.
Secondly, it finds out how disks are accessed best. This step is called 'probing'.
Finally, it loads the first sector of stage 2 and starts it. Throughout this text, we
will look at the first to steps only.
Historically, hard and floppy disks were seen as an assembly of several disks, with
read/write heads moving across the surface of a disk, one for each side of a disk,
finding tracks on it, and pieces of data, called sectors in these tracks.All the
information about the size of tracks and so on, is packed into a piece of data called
the geometry (CHS) of a disk. Until a few years ago, one had to find out the geometry
first, before one was able to access a disk.
Floppy disks were the first kind of disks available broadly. Soon there were a bunch
of formats around. 3.5", 5.25", single-sided, high-density and so on. All of those
formats had a different geometry, and it was an all but easy task to determine the
geometry of a given disk inserted into a drive. Early programs simply tried several
geometries out, instructed the drive to read specific sectors, and guessed from error
or success the geometry of the floppy disk. This method is called 'floppy probing' and
it has been part of GRUB stage 1 from version 1.1 on.
With the upcoming hard drives, being not removable, a fixed geometry could be tied to
a given drive. The BIOS started to support clients to retrieve this value. GRUB stage
1 utilizes such a BIOS call, too. It is referred to as INT 0x13, function 8.
The same BIOS call could be applied to floppy drives. But because differently
formatted media could be inserted into such a drive, the BIOS returns the maximum
capability of the drive here. Thinking of a 1.44 MB formatted disk inserted into a
2.88 MB drive, one can see, that there is a difference between a media and a drive
geometry. I refer to this when mentioning the drive/media semantics problem.
Nowadays hardware has become intelligent enough to hide the internals of a disk from
its users. One simply tell the BIOS to access the n-th sector on a disk, without
careing about geometry any more. This method is called LBA mode. Most of nowadays
BIOSes support this access. However, before using such an advanced technique, a client
such as GRUB stage 1 has to make sure, it is available. This 'LBA probing' is done by
calling a special BIOS function INT 0x13, function 0x42. It must be said, that the
early versions of GRUB stage 1 were not equipped with it.
The BIOS assigns a number to each drive in a system. The numbering scheme is such,
that hard drives are assigned a value above 127, whereas drives with removable media
get the range between 0 and 127. The limit 127 is not choosen randomly; In binary
notation you can tell values from below and above by simply checking the contents of a
single bit. Testing this HDD-Flag called bit thus tests whether a drive number belongs
to a floppy or hard drive. Stage 1 has to tell a floppy from a hard disk, because they
have to be probed differently.
Throughout this text, I will sometimes refer to registers and flags of the IA86
processor, namely the ES and SS segment register, the SP stack pointer register, and
two flags, the direction flag (DF) and the interrupt flag (IF). Its beyond the scope
of this text to introduce into the details of these registers. It might help to think
of them as storage locations holding values, and defining the state of the processor.
There are obscure dependencies between flags, registers and the way the processor
behaves. You are left alone with that, sorry. Hopefully you need not understand the
details to grasp the gist of this text, but I cannot guarantee that.
------
Version 1.1
Stage 1, version 1.1, had a clear approach with respect to initialization and probing.
Drives with its HDD-Flag clear underwent floppy probing, all others were probed by INT
0x13, function 8. Obviously Erich was aware of the drive/media semantics problem of
this BIOS call.
The initialization sequence put the processor into a known state. Namely ES and the
flag DF (direction flag) were cleared. Yet Erich made not sure, IF (interrupt flag)
was set, and he relied on the BIOS/Chainloader here (I have no confirmation IF=1 is a
mandatory part of the interface between BIOS and stage 1, so one may see this as a
minor bug).
Floppy probing used memory location 0x0000:0x2000 as a temporary disk buffer, the area
behind the stack. It implicitly assumed ES to be zero. This precondition is fullfilled
in this version, but Erich missed to express it verbatim, so we will see two later
patches violating it.
The floppy probing has some shortcomings with respect to some (now really unused)
floppy types like single-sided disks, but all-in-all it looks quite sound to me.
Version 1.3
This version moved the setting of the DF near to the instruction that utilizes it.
Whilst this is a small improvement with respect to stability (prior BIOS calls
clobbering this flag will not affect the outcome any more), it weakens the
initialization sequence. We will see a later patch stumble over this.
This revision removes the initialization of register ES in the beginning. Now the
before mentioned precondition of the floppy probing is at stake. Most BIOSes will
still load ES to 0 upon calling the bootloader, but again, I have no confirmation that
this is mandatory.
A CLI/STI pair is introduced in the initialization sequence. Though this finally
ensures IF to be set, the usage is strange, as I pointed out in my patch #1. Perhaps
Erich saw a race condition when updating the stack pointer in SS:SP with two
instructions. If so, he was unaware of a special feature of the x86 processor, that
locks the following instruction, if SS is written to. Properly used, this
automatically prevents a stale intermediate stacke pointer. A simple STI would have
been sufficient here.
Version 1.5
This version was either made in a hurry, or the author had little insight into the
working of stage 1. It introduced several bugs at the same time.
According to the comments, its objective was to include support for super disks. This
kind of hardware is an extension to floppy drives, allowing for huge disks up to 120
MB.
Making this kind of hardware bootable imposed a severe problem on the floppy probing:
One could not guess safely from the track size the geometry of the disk any more. So
this revision wanted all drives be probed by INT 0x13, function 8. As a side-effect,
the result was susceptible to the drive/media semantics problem. Not careing about
this was bug #1 introduced by this revision.
Maybe, the author thought, the BIOS call fails on floppy drives (which it does not),
maybe, it was because some buggy Compaq boxes hung on the floppy probing, as Okuji-san
claims, this revision effectively killed the floppy probing. Instead of looking at
register BL, sorting the super disk case out and passing the floppy disk cases on to
the floppy probing, it mysteriously invoked the floppy probing only when the BIOS call
completely failes. I am unable to envision a situation in which this behaviour makes
sense. [One has to see that INT 0x13, function 8, with floppy drives, retrieves its
information from the CMOS RAM in the battery powered RTC rather than probing the drive
itself. So, unless the drive does not exist (is not registered with the BIOS setup, to
be precise), or the battery is exhausted, the BIOS call will (almost) never fail.] To
me, the program logic simply renders the floppy probing as useless. If this was by
intention, one rather should have removed the floppy probing completely. If there was
really any sense in this construction, it should have been commented. As it is, I'm
very much inclined to see the program logic here as bug #2.
This version tested the HDD-Flag using an AND instruction, thereby, as a side-effect,
destroying the drive information. This bug #3 was corrected in a later patch.
Finally the INT 0x13, function 8, modifies the ES register, pointing to the Disk
Parameter Table afterwards. This, again, breaks the precondition on the floppy probing
path. This is bug #4.
Version 1.10
This version moved all constants into a separate header file, renaming most of them.
There seemed to be a confusion about what a segment is. Two constants, BUFFER_SEG and
STACK_SEG are effectively used as segment offsets in two places. Here, naming and
semantics of the constants do not coincide. Whereas the value STACK_SEG is used once
only (thus restricting the damage to misleading readers), BUFFER_SEG is used once as a
true segment, once as an offset, which is clearly a bug.
So, this revision moves the temporary disk buffer location of the floppy probing path
from ????:0x2000 to ????:????, where ???? denotes an unintended random value
(Remember: segment register ES was clobbered before).
Version 1.17
The LBA mode is coded in this revision. A then unused label 'lba_mode' is introduced,
denoting the beginning of LBA loader. Here, its semantics and placement has not been
clarified sufficiently, so later patches are mislead.
Version 1.25
An optimization in the message routine introduces a LODS instruction, which depends on
the uninitialized DF processor flag. The weakened processor initialization strikes
back here.
Version 1.33
The LBA probing mode is revised and a now unused chunk of code is removed. The
misplaced/misnamed label 'lba_mode' is misinterpreted, so the removal is not complete.
A superfluous instruction is left behind (cf. my patch #3).
Version 1.34
A patch location is placed amidst of an instruction, which I consider a really bad
idea. Now any modification of the code will move that location around, and,
unfortunately, a constant in a separate header file is depending on it!. There is no
way to automatically update this implicit dependence on the outcome of a compiler run.
Such a construction simply means asking for trouble. In fact, I was already bitten by
it, when playing with the code. Some more information in my patch #4.
As a side-effect, it now becomes difficult to maintain any 'inter-version
compatibility'.
Wolf Lammen
_______________________________________________
Bug-grub mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-grub