I am forced to reply to myself here this morning after coming across this: http ://old. nabble .com/Performance-regression-with- DRBD -8.3.12-and-newer-to33995000. html #a33995000
Which discusses big changes in CentOS in regard to barriers and flushes in the RAID controller introduced in EL6. I just re-tested to be 100% sure - I had disk barriers and flushes already disabled, but using the older syntax (yes I know it was written to be backward compatable for a while, but just in case ...) I changed to the newer syntax in my drbd .. conf : no-disk-barrier; no-disk-flushes; no-disk-flushes; I still find that the cfq scheduler gives me 3X performance over deadline (or noop , or anticipatory for that matter). Turning those 3 settings on or off makes no difference in the problem I am having. Also this morning I had time to test the new read-balancing in 8.4.1: I tried: read-balancing when-congested-remote; Thinking that this means if HA1 is congested (I see 97%+ I/O WAIT and disk reads as low as 80MB/s during my cp test - my RAID controller is capable of ), the system can balance out to a degree by reading from the HA2 node, but that didn't change things. Then I remembered I only had one node up and running so I am in DRBD "standalone mode" which doesn't help me - can't read from the other node if it's not there! Am I at least understanding all of this correctly, is that how read-balancing is supposed to work? In I had my other node online would I have expected to see a difference at all?? Recall that in my OP I am copying from drbd0 partition to / dev / shm and seeing performance 3X lower than if I perform the same test from a non- DRBD partition to / dev / shm . -Thanks -- Kenneth DeChick Linux Systems Administrator -- MEDENT -- This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently delete this e-mail. ----- Original Message ----- From: "Ken Dechick " < kend @ medent .com> To: [email protected] Sent: Thursday, July 12, 2012 4:37:24 PM Subject: DRBD perfomance on IBM M5110e controller Hello list, Odd one for you today. We recently started looking at the new IBM x3650M4 server as our next high-end machine for all clients. We have deployed several standalone installations at smaller clients, running CentOS 6.2 - things have been running very well - this is a monster of a server - 1U rack mount w/ 16 600GB HDD's in RAID 10 configuration with 132GB RAM. Couple days ago I began setting up a pair of machines in HA as that's the next logical step. Suddenly running into a DRBD perfomance issue I do not understand. I set this pair of machines up the same way I always did under CentOS 5.3: - 2 servers in active/passive configuration IBM x3650M4 with 4.3TB DRBD partiton across 16 600GB RAID10 2.5" HDDs RAID Controller: IBM MegaRAID M5110e ( LSI SAS2208 Thunderbolt) 132GB system RAM - CentOS 6.2 on Kernel 2.6.32-220.7.1.el6.x86_64 - heartbeat v3.0.4 - pacemaker v1.1.6 - DRBD v8.4.1 - using the standard tuning I developed past couple years with IBM hardware and the handy DRBD tuning guide: - deadline scheduler via "elevator=deadline" in kernel command line - using this drbd . conf : global { usage-count yes; } common { handlers { pri-on-incon-degr "/ usr /local/bin/support_ drbd _deg"; split-brain "/ usr /local/bin/support_ drbd _ sb "; fence-peer "/ usr /lib/ drbd /crm-fence-peer.sh"; fence-peer "/ usr /lib64/heartbeat/ drbd -peer-outdater -t 5"; after-resync-target "/ usr /lib/ drbd /crm-unfence-peer.sh"; } disk { resync-rate 300M; # limit the bandwidth which may be used by background # synchronizations; use 30M for 1Gb NIC , 300M for 10Gb NIC al-extents 3833; # Must be prime, number of active sets. on-io-error detach; # What to do when the lower level device errors. disk-barrier no; disk-flushes no; md-flushes no; fencing resource-only; #size 1000G; # for setting exact size of DRBD resource - DO NOT uncomment this!! #become-primary-on node-name # use this for DRBD withOUT heartbeat } net { protocol C; verify-alg md5; # can also use md5, crc32c, ect csums-alg md5; # can also use md5, crc32c, ect #timeout 60; # 6 seconds (unit = 0.1 seconds) #connect-int 10; # 10 seconds (unit = 1 second) #ping-int 10; # 10 seconds (unit = 1 second) #ping-timeout 5; # 500 ms (unit = 0.1 seconds) unplug-watermark 131072; # flush RAID controller buffers max-buffers 80000; # datablock buffers used before writing to disk. max-epoch-size 20000; # set max transfer size sndbuf-size 0; rcvbuf-size 0; ko-count 4; # Peer is dead if this count is exceeded. after- sb -0pri discard-zero-changes; after- sb -1pri consensus; after- sb -2pri disconnect; rr-conflict disconnect; cram-hmac-alg "sha256"; } } resource drbd0 { options { cpu-mask 0; on-no-data-accessible io-error; } device / dev /drbd0; disk / dev /sda4; meta-disk internal; on mofpeasHA1 { address 10.211.32.1:7789; } on mofpeasHA2 { address 10.211.32.2:7789; } } I did of course diligently read through all of the documentation for DRBD v8.4.1 with it being my first time above v8.3.7 on CentOS 5.3. Found these new option that sounded flavorful: options { cpu-mask 0; on-no-data-accessible io-error; } Also found that many options had moved around ( resync-rate replacing the old rate, no more syncer section, etc ), so I modified the config we have been using for a few years to reflect all these new changes I learned about. The above config seems to work just fine at this point. Doing a simple test of copying 4.5GB of data from my DRBD partition directly to memory (/ dev / shm /.) I have plenty of room there: tmpfs 64G 4.5G 59G 8% / dev / shm - echo 3 > / proc / sys / vm /drop_caches <- first I drop cache for an accurate test - time cp -rp / usr / medent / tapetest / / dev / shm /. <- here I copy a dir with roughtly 4.5GB of random data real world data to system memory real 0m57.623s user 0m0.001s sys 0m0.188s Wow that takes a long time - almost a full minute. Doesn't seem right as this machine is blazing fast. So I clear the cache and / dev / shm , then try the same test but pulling the same data from a non- DRBD partition: - echo 3 > / proc / sys / vm /drop_caches <- first I drop cache for an accurate test - time cp -rp /root/ tapetest / / dev / shm /. real 0m7.625s user 0m0.064s sys 0m3.272s - Quite a large difference there!! I can reproduce this over and over again - happens if DRBD is online and fully replicating or if I take one node down to run without any replication going on just to be sure. -We did so much tuning in DRBD back with CentOS 5 and IBM MegaRAID M5015 in similar RAID10 that I would hate to strip down my config to the default after install and start from scratch -- Kenneth DeChick Linux Systems Administrator -- MEDENT -- Kirk to Enterprise -- beam down yeoman Rand and a six-pack. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
