I don't mean to bash it. I think it is a very impressive piece of work.
But based on the bugs I've been fixing I believe the Etherboot approach is
fundamentally flawed. The approach is to take a few working drivers, get
them working as polled drivers, and voila -- snapshot of working kernel
drivers, that will diverge over time as the kernel changes, and need to be
constantly brought up to rev. It gets really messy as the fundamental
kernel interfaces start to diverge too from the snapshot the etherboot
driver is based on, because then in addition to bringing a driver up to
rev you have to figure out how to reassemble the code from the kernel
driver structure to the etherboot driver structure. I think this is going
to get to be real trouble in the long term. I've seen this happen before
on other OSes -- the idea of network boot is not new. The first network
boot I worked with was 20 years ago on PDP-11s, starting with Unix
drivers, and all the same difficulties started to creep in.

Stuff I've fixed
- The timer code flat out doesn't work on some embedded hardware
  Ok, this is fixable, and I have the patches. I should try to get
  that code back in.
- I've had to forward-rev some of the drivers for new pci devids.
  OK, fixable, but I find the code is kind of messy at this point.
  The problem is these are old-rev drivers. So in some in cases
  you are not going to be able to simply add "new devid for eepro100"
  and get a fixed etherboot driver, since the new device may bring with
  it new bugs that are only fixed in the new rev of the linux kernel --
  over time, the working linux driver is going to diverge in a big way
  from the Etherboot driver. I've seen some of this already.
- the change from 5.0.5 to 5.0.6 brought a new surprise. rtl8139 just
  flat out stopped working right at all. Even when I patched it
  with my working 5.0.5 driver it won't work. Why? Don't know.
  The question is, do I want to put effort into fixing etherboot or
  making it so I can just boot Linux from flash and get fixed drivers? My
  decision was to boot Linux. This adds messiness -- compact flash --
  but our netboot using Linux also uses IP multicast, which allows us
  to boot clusters in sublinear time. Using Linux allows you to do things
  easily that are really hard with Etherboot-like tools. Yes, you can
  add more and more capability to Etherboot. At what point are you
  building a new mini-OS? It's a hard decision.

Etherboot is a neat piece of work. But I'm seeing the same problems in
etherboot that I saw in the SunOS netboot approach 10 years ago -- an
increasing problem with divergence. It's not so bad right now, because
Etherboot is younger, but it is happening.

At some point, Sun actually starting building their netboot directly from
the OS source tree. That helped a lot with the divergence problem,
although the netboot grew to 200K -- bigger than a small Plan9 kernel.

But this might help for Etherboot. If Etherboot could build with Linux
drivers direct from the Linux source tree then divergence could be
minimized. But Etherboot might grow. Also this is a non-trivial problem to
make it go. As Linux kernel interfaces change this will impact Etherboot.

I don't know a good way out of this problem.

Sorry to be so annoying, I have to watch what I say. Email conveys
emotions you don't realize you're expressing. Especially when you write as
poorly as I do.

ron

Reply via email to