Thanks to everyone that responded.  I've learned a lot.  Let me say what
I know now and where I still have questions.

(1) 'how to boot,' I'll experiment with some of everyone's ideas tomorrow.  
Again, though, my system has a SCSI disk which is booted from; there is
one big RAID-1 filesystem on the two striped IDE disks.  In other words,
there is nothing required for booting on the RAID system, so the number of
cylinders on it should be irrelevant.  I think Robert Dehlem's suggestion
(disable the IDE disks in the BIOS settings and boot from SCSI) sounds
like the best bet, though the 'rdev' solution also sounds promising.

(2) proper shutdown:  I think the RAID system is shutdown properly at
system shutdown.  I had failed to create the persistent superblock at
first but fixed that right away.  The failure of the RAID to get mounted
at system startup, which I had thought was due to corruption, I now
believe to be

(3) size discrepancies:  My detailed notes aren't in front of me, but as I
remember it, when I first installed these two drives (WD Caviar 20.5G
drives, by the way), I left the BIOS settings at their defaults, namely
auto-detect IDE devices.  When I did this, Linux (meaning fdisk) saw
slightly different sizes for the two disks, even though they are
physically identical.  This was because the BIOS was sizing the disks
differently.  Experimenting with manually setting the disk characteristics
in the BIOS showed I had two choices:  a large number of heads/sectors and
a small number of cylinders in LBA mode, or the setting I'm using,
16-63-39761 heads-sectors-cylinders in NORMAL mode.  This was the only
setting which resulted in fdisk seeing identical sizes for the two disks.

Using fdisk to set up first an extended partition and then a logical
partition occupying the whole of each disk resulted in partitions which
are reported as running from cylinder 1 to 39761;  fdisk reports the
logical partition is 20039481 blocks. (16*63*39761/2 is 20039544, and I
don't understand why the difference.)  I created the RAID using the sample
RAID-0 config file with raidtools-0.90 with the appropriate devices
(/dev/hda5, /dev/hdc5) and did mke2fs /dev/md0.  So far so good, no
errors.  However, the file system created is 40078944 blocks long, while
2*20039481 is 40078962 (and e2fsck thinks the physical size of the device
is 40078720 blocks).  The chunk size I chose for the raid was 16k (really
the value in the sample file).  Now, int(20039481/16)*16 is 20039472, and
twice this is 40078944, the size of the file system mke2fs created.  So I
think the size of the file system was silently truncated to the largest
integer multiple of 16K (on each disk) which fit the partition I'd set up.  
But now, every time the system tries to mount the disk at startup, fsck
dies with an error because of the mismatch between the partition and file
system size, and the boot sequence drops me into a root shell to run fsck
manually.

So, what now?  In my opinion, mke2fs shouldn't have created a file system
which was a different size from the entire partition without saying
something.  I can't just take the risk of trimming the partition size to
match the file system size, because the size of the file system is not an
integer multiple of 504 blocks (the size of one 'cylinder').

--
Stephen Walton, Professor of Physics and Astronomy,
California State University, Northridge
[EMAIL PROTECTED]

Reply via email to