Matthias Bethke wrote:
Hi Jeroen,

To make a long story short (I'll tell it anyway for the archives' sake),
the problem seems solved so far. After recompiling the kernel with 100Hz
ticks I get a slight increase in latency when doing the same things as
yesterday on the server but it feels just as responsive as before.
Lacking any benchmarks from before the change, that's the best measure I
have. It might have been the firmware upgrade, but I doubt it. If I have
to reboot any time soon, maybe I'll do another test to verify it.
Rrright.. that's the big disadvantage of having to sort this stuff out on a production machine :) We're only now starting to convince (read: force) our developers *not *to put live code on production systems /before /we have had a chance to test it.
You know how it goes...
on Wednesday, 2006-04-19 at 20:42:09, you wrote:
The slowness is the same on SuSE and Gentoo based clients. The previous
installation handled the same thing without any problems, which I'd
certainly expect from a dual Xeon @3 GHz with $ GB RAM, a Compaq
SmartArray 642 U320 hostadpater and some 200 GB in a RAID5, connected
to the clients via GBit ethernet. RAID-5? Ouch. RAID-10 offers a much better raw performance; since individual mirrors are striped, you get at least 4/3 the seek performance of a 4-disk

Yeah, but also at 2/3 the capacity. I know RAID5 isn't exactly
top-notch, but as long as the controller takes care of the checksumming
and distribution and the CPU doesn't have to, it's good enough for our
site. That's mostly students doing their exercises, webbrowsing, some
programming, usually all with small datasets. The biggest databases are
about two gigs and the disks write at just above 40 MB/s.
Okay, all valid, but still - that's not even close to the theoretical practical maximum (as opposed to the theoretical theoretical limit) of GbE, topping out at around 80~90 MB/sec.
You can and will reach that in linear reads from a RAID-10 :)
Not only that, but the I/O *latency *also decreases substantially when /not /using RAID-5
As I think I already mentioned, but heh.

LoadAvg of over 10 for I/O only ? That is a serious problem.
I repeat, that is a *problem*, not bad performance.

Huh? No, 9 to 11 seconds, i.e. ~10 MB/s. I don't see a way how this
benchmark could possibly bring my load up that much, after all it's just
one process on the client and one on the server.
Okay, slight misunderstanding there, then.

Since you say the box has 4GB of RAM, what happens when you do a linear read of 2 or 3 GB of data, first uncached and then cached ?
That should not be affected by the I/O subsystem at all.

Writing gives me said 40 MB/s, reading it back (dd to /dev/null in 1 MB
chunks) is 32 MB/s uncached (*slower* than writes? Hm. controller
caching maybe...) , ~850 MB/s cached.
The 850MB/sec is about what you'd expect when the data only has to go from memory through the VFS layer. But writing being faster than reading.. hmmm.. I remember that usually only happening with RAID-1 - but perhaps that's one of the idiosynchrasies of that controller. Still, it should at least reach the 30MB/sec you get when using any other protocol.
Also, test your network speed by running netperf or ioperf between client and server.
Get some baseline values for maximum performance first!

I didn't test it as the only thing I changed was the server software and
it was just fine before. And it *is* fine as long as the server disks
aren't busy. Theoretically it could be that the Broadcom NIC driver
started sucking donkey balls in kernel 2.6, but ssh and stuff are just
fine and speedy (~30 MB/s for a single stream of zeroes).
Still only 300mbits, though... a server like that should be able to handle SSH at near-line speed.
And more bla I don't understand about NFS - what about the basics ?
Which versions are the server and client running ?
Since both could run either v2 or v3 and in-kernel or userspace, that's 4 x 4 = 16 possible combinations right there - and that is assuming they both run the *same* minor versions of the NFS software.

It's v3, that's why I snipped the unused v2 portions of nfsstat output.
Both server and client are in-kernel---the client could only be
userspace via FUSE, right?---and the latest stable versions,
nfs-utils-1.0.6-r6, gentoo-sources-2.6.15-r1 on the client and
hardened-sources-2.6.14-r7 on the server.

Okay, forget I mentioned it - you seem to have that part covered :)
And one parameter I haven't tried to tweak is the IO scheduler. I seem
to remember a recommendation to use noop for RAID5 as the cylinder
numbers are completely virtual anyway so the actual head scheduling
should be left to the controller. Any opinions on this?
I have never heard of the I/O scheduler being able to influence or get data directly from disks. In fact, as far as I know that is not even possible with IDE or SCSI, which both have their own abstraction layers. What you probably mean is the way the scheduler is allowed to interface with the disk subsystem - which is solely determined by the disk subsystem itself.

OK, that was a bit misleading, I meant that even assuming things about
the flat file the scheduler sees of the disk like that offsets in the
file sort of linearly correspond to cylinders
That *may* be true for old IDE drives, but it isn't even remotely true for SCSI, which is its own higher-level abstraction on top of the physical drive interface already, not to mention the layer the SmartArray puts on top of /that/.
---which is what it does to
implement things like the elevator algorithm---are virtually always
right for simple drives but may not be for a RAID.
Erm.. that's what I said :)
One thing I've noticed when working with SmartArray controllers is that they truly are rather smart,i.e. it's hard to peek under the hood and understand exactly what is happening.

Unless the money has to come out of a tight budget I would seriously recommend you invest in either A. the BBWC available for most SmartArray controllers (128MB of r/w cache), and/or B. a slew of 146GB 10k rpm drives to fuel a RAID-10 set of, say, 300 GB (4 drives)
It's a hell of a lot more redundant as well ...

Good luck!

J

--
[email protected] mailing list

Reply via email to