I'm preparing a classic 2-node cluster and cannot understand the results I'm 
getting for write throughput.

Hardware & software:
* Dell PowerEdge R300 servers, 1x Xeon X3323 (2.50GHz qc), 16GiB ram, Dell 
SAS/6 (LSI SAS1068E) PCIe x8 controller, 2x Seagate Barracuda ES.2 SAS 750GB 
disks
* Broadcom NetXtreme II BCM5709 PCIe x4 dual gigabit card (rx and tx checksum 
offload, scatter-gather, tcp segmentation offload), with latest bnx2 driver 
v1.9.20b (MSI-X interrupts)
* Linux-2.6.30.9 from 'vanilla' sources. But I got exactly the same results 
with XenLinux-2.6.18.8 under Xen-3.4.2 hypervisor (the cluster will eventually 
end up running Xen)
* DRBD-8.0.16 (I wasn't able to get any 8.3.x working with XenLinux-2.6.18.8... 
gentoo portage will compile 8.3.6 but then it goes Oooops)
* I got the same results when testing drbd-8.3.6 with linux-2.6.30.9 (with the 
same drbd.conf I used with 8.0.16)

Configuration:
* linux software raid1 partition for Domain-0 filesystem (6GB), swap partition 
(0.5G), drbd partition (~690G)
* two 'cross' connections between the NetXtremeII dual gigabit cards, jumbo 
frames (MTU=9000), rr-balance bonding
* drbd is in Primary/Secondary mode
* all benchmarks were made on a single ext3 filesystem made on the whole device 
(/dev/sda4 for raw disk, /dev/md1 for raid0, /dev/drbd0 for drbd), mounted 
noatime
* all benchmarks were repeated many times (16 runs for bonnie++, 8 runs for dd) 
and variance was always negligible, I got really consistent results between runs
* all drbd benchmarks were made with the resource UpToDate and connected
* I did an mkfs.ext3 and a reboot prior to any bonnie++ benchmark

Baseline performance was exactly as expected:
* netperf measured 1968MBit/s through the 2x1000Gbit link (and 990MBit/s with 
one cable disconnected)
* dd and bonnie++ show 116MiB/s read and 98MiB/s write throughput on the single 
disk (fs on /dev/sda4)
* dd and bonnie++ show 214MiB/s read and 190MiB/s write thoughput on software 
raid0 (fs on /dev/md1 made up of sda4 and sdb4)

Syncronization performance with drbd is very good:
* raising the syncer rate I was able to get the two nodes sync'ing both disks 
(sda/srv1 -> sda/srv2, and the opposite sdb/srv2 -> sdb/srv1) at the same time 
at >80MiBytes/sec (there was nearly 700MBit/sec in each direction on the 
network link), both disks where UpToDate in a little over 2 hours.
* that should confirm the systems don't have any bus bottleneck in both network 
and sas cards

When using DRBD, read throughput is around 116MiB/s as expected, but write 
throughput is much lower. The next figures were taken using dd with bs=512M and 
count=1.
- with the default configuration (no relevant options in drbd.conf) I get 
48MB/sec with protocol C and 49MB/sec with protocol A
- increasing max-buffers and max-epoch-size (with equal values) will actually 
yeld lower throughput: with protocol C it goes from 48MB/s (2048, default 
value) to 42MB/s (4000) to 39MB/s (8000)
- increasing sndbuf-size to 512k or 1024k does not change anything
- changing unplug-watermark from a minimum of 16 to a maximum equal to 
max-buffers only gets irrelevant changes in write performance (+/- 1MiB/sec)
- the above holds true for any combination: sndbuf-size does not change 
performance with any protocol and any buffers and any unplug-watermark tried, 
and an increase in max-* always degrades performance
- the good thing is, write speed on resource #1 from node1 does not slow down 
even while syncing resource #2 from node2 to node1 (as expected, by the way)

DRBD documentation says I should expect nearly the same sequential write 
throughput with and without drbd, but I'm getting less than half of that. I 
tried tweaking the config without success, I could only actually worsen my 
situation. Changing between old and new kernels or old and new drbd doesn't 
seem to make any difference.

I'd like to know if I'm correct in assuming I should get something around 
90MiB/sec of write thoughput even with drbd, and if I'm doing anything wrong in 
configuring drbd or benchmarking it. Any help will be appreciated, thanks.

My plan was to go look at latency figures after checking out throughput, 
following your documentation, but for reference, I get ~8.4ms latency with drbd 
and ~8.3ms directly on disk, which I think are correct and expected numbers 
from these systems.

--
Luca Lesinigo
LM Networks Srl
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to