The mysterious call to the floppy probing in stage 1 is still an unresolved
issue to me. As Okuji-san advised me, I first read through the archived mail
list. But after having read a year's amount of material, I finally got sick
and tired of it. To shed some light on the situation, I went through the
revisions of stage 1 in the CVS instead.
Here is my report, for those of you interested in software development,
especially that of stage 1. I focussed on the topics of my 5 patches sent in
recently, and that hard to understand program logic around the probing BIOS call
in the CHS code path of stage 1.
A reader should have some familarity with the contents of stage 1 to see the
details. To me, it was instructive and enlightening to see, how common
shortcomings in software design and documentation finally evolve into
bugs/errors/misconceptions.

Have fun

Wolf Lammen


-- 
+++ NEU bei GMX und erstmalig in Deutschland: T�V-gepr�fter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
A stroll through the history of GRUB stage 1

It all started in 1996 when Erich Boleyn posted version 1.1 of GRUB stage 1. This tiny 
program has to find its successor stage 2 on a disk, be it a floppy or a hard disk, 
load the first 512 bytes into RAM and transfer control to it.
Stage 1 is the very first program started, after a computer has booted. Therefore it 
has no helper library available other than the BIOS. Furthermore, it is confined into 
about 400 bytes of code, which does not allow for any luxury. So it is coded in 
assembler, using low-level techniques. To make things even more difficult, the BIOS 
does not prove to be a very reliable companion.
This all is quite difficult to maintain and understand for human beings. I'd like to 
say this first, before I point to errors of others, errors that might have occurred to 
me, too, if I had been in the process of development, perhaps suffering from time or 
budget pressure. To me, those errors are understandable.

The following gives a short introduction into the terminology of disk and boot 
technology. Readers familar with stage 1 may skip this section safely and continue 
after the next dashed line (---).

Stage 1 fulfills his task in roughly 3 steps: Firstly, it initializes itself. 
Secondly, it finds out how disks are accessed best. This step is called 'probing'. 
Finally, it loads the first sector of stage 2 and starts it. Throughout this text, we 
will look at the first to steps only.
Historically, hard and floppy disks were seen as an assembly of several disks, with 
read/write heads moving across the surface of a disk, one for each side of a disk, 
finding tracks on it, and pieces of data, called sectors in these tracks.All the 
information about the size of tracks and so on, is packed into a piece of data called 
the geometry (CHS) of a disk. Until a few years ago, one had to find out the geometry 
first, before one was able to access a disk.
Floppy disks were the first kind of disks available broadly. Soon there were a bunch 
of formats around. 3.5", 5.25", single-sided, high-density and so on. All of those 
formats had a different geometry, and it was an all but easy task to determine the 
geometry of a given disk inserted into a drive. Early programs simply tried several 
geometries out, instructed the drive to read specific sectors, and guessed from error 
or success the geometry of the floppy disk. This method is called 'floppy probing' and 
it has been part of GRUB stage 1 from version 1.1 on.
With the upcoming hard drives, being not removable, a fixed geometry could be tied to 
a given drive. The BIOS started to support clients to retrieve this value. GRUB stage 
1 utilizes such a BIOS call, too. It is referred to as INT 0x13, function 8.
The same BIOS call could be applied to floppy drives. But because differently 
formatted media could be inserted into such a drive, the BIOS returns the maximum 
capability of the drive here. Thinking of a 1.44 MB formatted disk inserted into a 
2.88 MB drive, one can see, that there is a difference between a media and a drive 
geometry. I refer to this when mentioning the drive/media semantics problem.
Nowadays hardware has become intelligent enough to hide the internals of a disk from 
its users. One simply tell the BIOS to access the n-th sector on a disk, without 
careing about geometry any more. This method is called LBA mode. Most of nowadays 
BIOSes support this access. However, before using such an advanced technique, a client 
such as GRUB stage 1 has to make sure, it is available. This 'LBA probing' is done by 
calling a special BIOS function INT 0x13, function 0x42. It must be said, that the 
early versions of GRUB stage 1 were not equipped with it.
The BIOS assigns a number to each drive in a system. The numbering scheme is such, 
that hard drives are assigned a value above 127, whereas drives with removable media 
get the range between 0 and 127. The limit 127 is not choosen randomly; In binary 
notation you can tell values from below and above by simply checking the contents of a 
single bit. Testing this HDD-Flag called bit thus tests whether a drive number belongs 
to a floppy or hard drive. Stage 1 has to tell a floppy from a hard disk, because they 
have to be probed differently.
Throughout this text, I will sometimes refer to registers and flags of the IA86 
processor, namely the ES and SS segment register, the SP stack pointer register, and 
two flags, the direction flag (DF) and the interrupt flag (IF). Its beyond the scope 
of this text to introduce into the details of these registers. It might help to think 
of them as storage locations holding values, and defining the state of the processor. 
There are obscure dependencies between flags, registers and the way the processor 
behaves. You are left alone with that, sorry. Hopefully you need not understand the 
details to grasp the gist of this text, but I cannot guarantee that.

------

Version 1.1
Stage 1, version 1.1, had a clear approach with respect to initialization and probing. 
Drives with its HDD-Flag clear underwent floppy probing, all others were probed by INT 
0x13, function 8. Obviously Erich was aware of the drive/media semantics problem of 
this BIOS call.
The initialization sequence put the processor into a known state. Namely ES and the 
flag DF (direction flag) were cleared. Yet Erich made not sure, IF (interrupt flag) 
was set, and he relied on the BIOS/Chainloader here (I have no confirmation IF=1 is a 
mandatory part of the interface between BIOS and stage 1, so one may see this as a 
minor bug).
Floppy probing used memory location 0x0000:0x2000 as a temporary disk buffer, the area 
behind the stack. It implicitly assumed ES to be zero. This precondition is fullfilled 
in this version, but Erich missed to express it verbatim, so we will see two later 
patches violating it.
The floppy probing has some shortcomings with respect to some (now really unused) 
floppy types like single-sided disks, but all-in-all it looks quite sound to me.

Version 1.3
This version moved the setting of the DF near to the instruction that utilizes it. 
Whilst this is a small improvement with respect to stability (prior BIOS calls 
clobbering this flag will not affect the outcome any more), it weakens the 
initialization sequence. We will see a later patch stumble over this.
This revision removes the initialization of register ES in the beginning. Now the 
before mentioned precondition of the floppy probing is at stake. Most BIOSes will 
still load ES to 0 upon calling the bootloader, but again, I have no confirmation that 
this is mandatory.
A CLI/STI pair is introduced in the initialization sequence. Though this finally 
ensures IF to be set, the usage is strange, as I pointed out in my patch #1. Perhaps 
Erich saw a race condition when updating the stack pointer in SS:SP with two 
instructions.  If so, he was unaware of a special feature of the x86 processor, that 
locks the following instruction, if SS is written to. Properly used, this 
automatically prevents a stale intermediate stacke pointer. A simple STI would have 
been sufficient here.

Version 1.5
This version was either made in a hurry, or the author had little insight into the 
working of stage 1. It introduced several bugs at the same time.
According to the comments, its objective was to include support for super disks. This 
kind of hardware is an extension to floppy drives, allowing for huge disks up to 120 
MB.
Making this kind of hardware bootable imposed a severe problem on the floppy probing: 
One could not guess safely from the track size the geometry of the disk any more. So 
this revision wanted all drives be probed by INT 0x13, function 8. As a side-effect, 
the result was susceptible to the drive/media semantics problem. Not careing about 
this was bug #1 introduced by this revision.
Maybe, the author thought, the BIOS call fails on floppy drives (which it does not), 
maybe, it was because some buggy Compaq boxes hung on the floppy probing, as Okuji-san 
claims, this revision effectively killed the floppy probing. Instead of looking at 
register BL, sorting the super disk case out and passing the floppy disk cases on to 
the floppy probing, it mysteriously invoked the floppy probing only when the BIOS call 
completely failes. I am unable to envision a situation in which this behaviour makes 
sense. [One has to see that INT 0x13, function 8, with floppy drives, retrieves its 
information from the CMOS RAM in the battery powered RTC rather than probing the drive 
itself. So, unless the drive does not exist (is not registered with the BIOS setup, to 
be precise), or the battery is exhausted, the BIOS call will (almost) never fail.] To 
me, the program logic simply renders the floppy probing as useless. If this was by 
intention, one rather should have removed the floppy probing completely. If there was 
really any sense in this construction, it should have been commented. As it is, I'm 
very much inclined to see the program logic here as bug #2.
This version tested the HDD-Flag using an AND instruction, thereby, as a side-effect, 
destroying the drive information. This bug #3 was corrected in a later patch.
Finally the INT 0x13, function 8, modifies the ES register, pointing to the Disk 
Parameter Table afterwards. This, again, breaks the precondition on the floppy probing 
path. This is bug #4.

Version 1.10
This version moved all constants into a separate header file, renaming most of them.
There seemed to be a confusion about what a segment is. Two constants, BUFFER_SEG and 
STACK_SEG are effectively used as segment offsets in two places. Here, naming and 
semantics of the constants do not coincide. Whereas the value STACK_SEG is used once 
only (thus restricting the damage to misleading readers), BUFFER_SEG is used once as a 
true segment, once as an offset, which is clearly a bug.
So, this revision moves the temporary disk buffer location of the floppy probing path 
from ????:0x2000 to ????:????, where ???? denotes an unintended random value 
(Remember: segment register ES was clobbered before).

Version 1.17
The LBA mode is coded in this revision. A then unused label 'lba_mode' is introduced, 
denoting the beginning of LBA loader. Here, its semantics and placement has not been 
clarified sufficiently, so later patches are mislead.

Version 1.25
An optimization in the message routine introduces a LODS instruction, which depends on 
the uninitialized DF processor flag. The weakened processor initialization strikes 
back here.

Version 1.33
The LBA probing mode is revised and a now unused chunk of code is removed. The 
misplaced/misnamed label 'lba_mode' is misinterpreted, so the removal is not complete. 
A superfluous instruction is left behind (cf. my patch #3).

Version 1.34
A patch location is placed amidst of an instruction, which I consider a really bad 
idea. Now any modification of the code will move that location around, and, 
unfortunately,  a constant in a separate header file is depending on it!. There is no 
way to automatically update this implicit dependence on the outcome of a compiler run. 
Such a construction simply means asking for trouble. In fact, I was already bitten by 
it, when playing with the code. Some more information in my patch #4.
As a side-effect, it now becomes difficult to maintain any 'inter-version 
compatibility'.

Wolf Lammen
_______________________________________________
Bug-grub mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-grub

Reply via email to