Hi everyone,

we have this fairly simple setup where we have two CentOS 5.5 nodes running xen 
3.4.2 compiled from sources (kernel 2.6.18-xen) and DRBD 8.3.7 also compiled 
from sources. Both nodes have two data partitions which are synced by DRBD. 
Each node is running a single VM from either of the partitions in a standard 
Primary/Secondary mode. This way each node can fully utilize its CPU and memory 
resources and we still have storage failover capabilities. The VMs are using 
the drbd devices directly (no LVM and such). Both nodes are connected through a 
gigabit ethernet port and a crossover cable.

Over time as the VM resource usage raised it started behaving strangely. After 
investigating, everything points to an IO problem as read and writes are very 
slow. 

My tests have shows that while the DRBD replication is connected and running, 
IO performance is very bad. Not only is it bad inside the VM but also on the 
host node. This is as if DRBD would cause the underlying IO subsystem to become 
very slow. Now I should say that the servers are using Adaptec 5405 raid cards 
with BBUs and write cache enabled. As for disks, we have 4x SATA drives 
configured as a RAID-10.

As soon as I disconnect DRBD, the IO performance is way better both inside and 
outside the VMs. 

Xen VM config:
disk = [ 'drbd:drbd0,sda,w' ]

# drbdsetup /dev/drbd1 show
disk {
        size                    0s _is_default; # bytes
        on-io-error             detach;
        fencing                 dont-care _is_default;
        no-disk-barrier ;
        no-disk-flushes ;
        no-md-flushes   ;
        max-bio-bvecs           0 _is_default;
}
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          8192;
        max-buffers             8192;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             0 _is_default; # bytes
        rcvbuf-size             0 _is_default; # bytes
        ko-count                0 _is_default;
        cram-hmac-alg           "sha1";
        shared-secret           "secret";
        after-sb-0pri           discard-zero-changes;
        after-sb-1pri           discard-secondary;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
}
syncer {
        rate                    33792k; # bytes/second
        after                   -1 _is_default;
        al-extents              1801;
        verify-alg              "crc32c";
}
protocol C;
_this_host {
        device                  minor 1;
        disk                    "/dev/sda7";
        meta-disk               internal;
        address                 ipv4 10.10.0.1:7789;
}
_remote_host {
        address                 ipv4 10.10.0.2:7789;
}

I have also noticed that the 'lo' and 'ua' values were usually fairly high in 
/proc/drbd. Also, the activity log updates are increasing fairly rapidly at 10 
updates a second. 

# On the primary node
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:1912226456 nr:0 dw:1406464980 dr:503249931 al:153036012 bm:3232164 lo:0 
pe:36 ua:0 ap:35 ep:1 wo:d oos:0

# Secondary node
1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:10502904 dw:1911380520 dr:0 al:0 bm:45648 lo:38 pe:0 ua:38 ap:0 
ep:1 wo:d oos:0


Any ideas?

Thanks
--
Jean-Francois Chevrette

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to