This is the summary of my SMP+IDE tests, done with several kernels. The important ones
however were all done with the same 2.2.13pre15 kernel binary.
The following with 2.2.13pre15 SMP kernel (for applied patches see below):
what didn�t work:
(1) dual P3 machine: kernel up for 10 mins, rm of two large files, result: NULL
dereference in unlink() (see other message)
(2) dual P3 machine: stress test on all five drives (on three controllers), result:
hard lockup after four hours (no oops etc.)
what works (I moved the 2 promise controllers and the 4 raid hds to a single CPU
machine):
(3) dual P3 machine: stress test on onboard-controller-hda only: works for 15 hours,
still running
(4) single P3 machine: exactly the same test as (2), but also works for 15 hours,
still running
As mentioned, (1) to (4) with the same (SMP) kernel binary.
One more pre15 test:
2.2.13pre15 with Unified IDE 2.2.13pre14-19991003 (two rejects in ide.c, one ok, one
probably harmless):
(5) dual P3 machine: NULL deref after 6 hours (i.e. this pre15 kernel survived longest)
Earlier test to check if the dual P3 hardware works reliable with non-SMP kernel:
(6) 2.2.13pre14+raid+large-disk+spinlock-patch-nonSMP: worked for 15 hours, then I
stopped it
Earlier tests (all on the dual P3 machine):
(7) 2.2.13pre14+UnifIDE-13pre12-19990925+raid+large-disk+spinlock-patch: worked for 6
hours, then NULL oops
(8) 2.2.13pre14+raid+large-disk: hard lockup after 30 mins
(9) 2.2.13pre14+raid+large-disk+spinlock-patch: hard lockup after 2.5h
I can think of these possible reasons for the SMP problems:
(A) SMP race(s) in IDE driver in original 2.2.13pre15
(B) SMP-deadlock in raid-2.2.11-patch
(C) problem with large disk patch (unlikely)
(D) hardware issue (seems unlikely since all works well with non-SMP kernel or machine)
Anyway, I�ll finally use the single P3 machine for the raid and export it to the SMP
machine via NFS.
Probably Red Hat or someone else with enough resources should do some intensive
SMP+IDE+RAID testing. Red Hat ships raid anyway, and people are tempted to use IDE
drives (think of the "i" in "raid"). If they do so on a SMP machine, this may be a
problem.
Further remarks:
- the test program writes pseudo random data with multiple processes and verifies
during read-back. In all tests there was no single data error.
- unlink(): I later added a remove() to the test program (after read+verify), but the
unlink-oops didn�t occur again with 2.2.13pre15. The oops may not be directly
unlink()-related (eventually to process startup/finish, I had a similar oops with an
earlier kernel >=2.2.12+UnifiedIDE with many processes terminating, while unlink()s
were in progress). This seems not related to multiple drives (assuming that
[EMAIL PROTECTED] only uses one hd)
- the oopses are attached to my earlier messages, if necessary I can send more info
about hardware etc.
2.2.13pre15 kernel patches:
- raid 2.2.11
- large disk patch (for raid5 IBM 37GB drives)
- small cheat in pci.h to make kernel believe the promise66 controllers are promise33
- tasks.h: 4000
- compiled with gcc 2.7.2.3
- max files increased via proc (but there never were many files concurrently open)
- no unified ide patch for (1) to (4)
hardware:
- SMP, dual p3-450
- 2 x promise UDMA66
- 25GB drive as hda, on onboard controller as single master
- 37GB drives at hde, hdf, hdi, hdj
--
the online community service for gamers & friends - http://www.rivalnet.com
* unterst�tzt �ber 50 PC-Spiele im Multiplayer-Modus
* Dateien senden & empfangen bis 500 MB am St�ck
* Newsgroups, Mail, Chat & mehr