Hi Everyone,
I have a DRBD performance problem that has got me completely confused. I
hoping that someone can help with this one as my other servers that use the
same type of RAID cards and DRBD don't have this problem.
For the hardware, I have two Dell R515 servers with the H700 card, basically an
LSI Megaraid based card, and running SLES 11 SP1. This problem shows up on
drbd 8.3.11, 8.3.12, and 8.4.1 but I haven't tested other versions.
here is the simple config I made based on the servers that don't have any
issues:
global {
# We don't want to be bother by the usage count numbers
usage-count no;
}
common {
protocol C;
net {
cram-hmac-alg md5;
shared-secret "P4ss";
}
}
resource r0 {
on san1 {
device /dev/drbd0;
disk
/dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part2;
address 10.60.60.1:63000;
flexible-meta-disk
/dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part1;
}
on san2 {
device /dev/drbd0;
disk
/dev/disk/by-id/scsi-36782bcb0698b6e00167bb1d107a77a47-part2;
address 10.60.60.2:63000;
flexible-meta-disk
/dev/disk/by-id/scsi-36782bcb0698b6e00167bb1d107a77a47-part1;
}
startup {
wfc-timeout 5;
}
syncer {
rate 50M;
cpu-mask 4;
}
disk {
on-io-error detach;
no-disk-barrier;
no-disk-flushes;
no-disk-drain;
no-md-flushes;
}
}
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by phil@fat-tyre
<mailto:phil@fat-tyre> , 2011-06-29 11:37:11
0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----s
ns:0 nr:0 dw:8501248 dr:551 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:n
oos:3397375600
So, when I'm running just with one server and no replication the performance
hit with DRBD is huge. The backing device shows a throughput of:
----
san1:~ # dd if=/dev/zero
of=/dev/disk/by-id/scsi-36782bcb0698b6300167badae13f2884d-part2 bs=1M
count=16384
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 16.4434 s, 1.0 GB/s
----
san1:~ # dd if=/dev/zero of=/dev/drbd/by-res/r0 bs=1M count=16384
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 93.457 s, 184 MB/s
-------
using iostat I see part of the problem:
avg-cpu: %user %nice %system %iowait %steal %idle
0.08 0.00 16.76 0.00 0.00 83.17
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 0.00 0.00 0.00 0 0
sdb 20565.00 0.00 360.00 0 719
drbd0 737449.50 0.00 360.08 0 720
avg-cpu: %user %nice %system %iowait %steal %idle
0.07 0.00 28.87 1.37 0.00 69.69
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 1.50 0.00 0.01 0 0
sdb 57859.50 0.00 177.22 0 354
drbd0 362787.00 0.00 177.14 0 354
the drbd device is showing a TPS about 10x - 20x of the backing store. When I
do this on my other servers I don't see anything like it. The working servers
are also running the same kernel and drbd versions.
Does anyone have any ideas of how this might be resolved or fixed? I'm at a
loss right now.
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user