Hi everyone,
we have this fairly simple setup where we have two CentOS 5.5 nodes running xen
3.4.2 compiled from sources (kernel 2.6.18-xen) and DRBD 8.3.7 also compiled
from sources. Both nodes have two data partitions which are synced by DRBD.
Each node is running a single VM from either of the partitions in a standard
Primary/Secondary mode. This way each node can fully utilize its CPU and memory
resources and we still have storage failover capabilities. The VMs are using
the drbd devices directly (no LVM and such). Both nodes are connected through a
gigabit ethernet port and a crossover cable.
Over time as the VM resource usage raised it started behaving strangely. After
investigating, everything points to an IO problem as read and writes are very
slow.
My tests have shows that while the DRBD replication is connected and running,
IO performance is very bad. Not only is it bad inside the VM but also on the
host node. This is as if DRBD would cause the underlying IO subsystem to become
very slow. Now I should say that the servers are using Adaptec 5405 raid cards
with BBUs and write cache enabled. As for disks, we have 4x SATA drives
configured as a RAID-10.
As soon as I disconnect DRBD, the IO performance is way better both inside and
outside the VMs.
Xen VM config:
disk = [ 'drbd:drbd0,sda,w' ]
# drbdsetup /dev/drbd1 show
disk {
size 0s _is_default; # bytes
on-io-error detach;
fencing dont-care _is_default;
no-disk-barrier ;
no-disk-flushes ;
no-md-flushes ;
max-bio-bvecs 0 _is_default;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size 8192;
max-buffers 8192;
unplug-watermark 128 _is_default;
connect-int 10 _is_default; # seconds
ping-int 10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count 0 _is_default;
cram-hmac-alg "sha1";
shared-secret "secret";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect _is_default;
rr-conflict disconnect _is_default;
ping-timeout 5 _is_default; # 1/10 seconds
}
syncer {
rate 33792k; # bytes/second
after -1 _is_default;
al-extents 1801;
verify-alg "crc32c";
}
protocol C;
_this_host {
device minor 1;
disk "/dev/sda7";
meta-disk internal;
address ipv4 10.10.0.1:7789;
}
_remote_host {
address ipv4 10.10.0.2:7789;
}
I have also noticed that the 'lo' and 'ua' values were usually fairly high in
/proc/drbd. Also, the activity log updates are increasing fairly rapidly at 10
updates a second.
# On the primary node
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:1912226456 nr:0 dw:1406464980 dr:503249931 al:153036012 bm:3232164 lo:0
pe:36 ua:0 ap:35 ep:1 wo:d oos:0
# Secondary node
1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:10502904 dw:1911380520 dr:0 al:0 bm:45648 lo:38 pe:0 ua:38 ap:0
ep:1 wo:d oos:0
Any ideas?
Thanks
--
Jean-Francois Chevrette
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user