Hi,
I've managed to solve the issue meanwhile so I thought I'd post the
solution here for the record and to help others who might face similar
problems. It turned out that the problem originates in the aacraid
kernel driver itself, in which a default was changed somewhere between
2.6.27 and 2.6.32 here:
commit d8e965076514dcb16410c0d18c6c8de4dcba19fc
Author: Leubner, Achim <[email protected]>
Date: Wed Apr 1 07:16:08 2009 -0700
[SCSI] aacraid driver update
changes:
- set aac_cache=2 as default value to avoid performance problem
(Novell bugzilla #469922)
Backporting that patch or explicitely passing "cache=2" parameter to the
aacraid module did the trick and write performance got reasonably high.
Why on earth could only my specific setup with RAID5+DRBD trigger this
problem (and not native RAID5 or DRBD with any other RAID), I have no
idea, though. Any input from someone more familiar with how DRBD and the
aacraid driver works internally would be welcome.
greets,
Peter
Peter Gyongyosi wrote:
Hi,
I've got a strange performance problem when using DRBD on top of a
hardware RAID5 cluster and I'd like to ask your input about the issue.
I'm trying to use DRBD on a box with an Adaptec 5405 controller and a
bunch of attached SATA disks. When I create RAID0 or RAID1 arrays from
the disks, everything works fine. However, when I create a RAID5 array,
the write performance drops drastically. Using the measurement suggested
in
http://www.drbd.org/users-guide-emb/ch-benchmark.html#s-measure-throughput
it seems like the write speed drops to about one tenth of the raw write
speed provided by the underlying RAID cluster (to about 20MB/sec from
the raw 200MB/s).
In my testing scenario, no slave node is attached to the DRBD cluster,
so network latency cannot be the bottleneck and I'm using the whole
/dev/sda disk given by the raid controller as the backing device with
internal metadata. I've also tried to put the metadata on a separate
RAID1 cluster but that did not help either. I've tried every suggestion
(no-disk-barriers, playing with I/O schedulers etc.) in the "Optimizing
DRBD performance" section of the manual with only minor effects.
Installing the same DRBD configuration on a software raid setup made
from the same disks works just fine, without the performance hit.
Now here comes the weird part: I've also tried to install a stock Ubuntu
Lucid Lynx (with kernel 2.6.32 and DRBD 8.3.7) on the same hardware and
play around with it. It turns out that on Lucid the DRBD performance on
a RAID5 array is much, much better (there's only a 20% drop in write
speed when compared to the raw speed, which is still a lot but it's
acceptable for me as it's still over the limit imposed by the GigE link
that will be between the two nodes in the final setup).
Unfortunately, I need to use this setup with a customized distro, where
simply changing the kernel to the latest-greatest 2.6.32 version with
DRBD 8.3.7 would take tons of work and should be only the last resort.
I've browsed through the changelogs of DRBD between versions 8.3.0 and
8.3.7 but didn't find anything relevant to my problem. My question is
that do any of you know of any particular change, patch, different
default setting or anything that could have changed either in DRBD code
between 8.3.0 and 8.3.7 or in the kernel between 2.6.27 and 2.6.32 and
could be relevant to this situation? What could've changed that _only_
affects DRBD write speed on a HW RAID5 array (and not on HW RAID1, not
on software RAID5 and not the raw speed without DRBD).
To summarize it again:
kernel 2.6.27 + drbd 8.3.0:
- raw HW RAID5: fast
- DRBD on HW RAID1 or RAID0: fast
- DRBD on software RAID5: fast
- DRBD on HW RAID5: slow <----------- that's the difference and my
problem :)
kernel 2.6.32 + drbd 8.3.7:
- raw HW RAID5: fast
- DRBD on HW RAID1 or RAID0: fast
- DRBD on software RAID5: fast
- DRBD on HW RAID5: fast
thanks,
Peter
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user