[DRBD-user] High iowait on primary DRBD node with large sustained writes and replication enabled to secondary

Paul Freeman Mon, 07 Jan 2013 22:20:32 -0800

Firstly I want to say thank you to the developers and maintainers of DRBD for a 
great application.  I have been using it in production for a couple of years 
and it has worked extremely well.


This is a lengthy post as I wanted to give a reasonable amount of detail 
describing my system and the problem.  I trust this is OK and you will persist 
with reading:-) 

Some background on my system:
I have an iSCSI storage system comprising two identical servers running a 3TB 
RAID10 (6 x 1TB enterprise grade SATA II discs with 3Ware 9690SA controller and 
BBU) which were running DRBD (v 8.3.7) over LVM in a primary-secondary 
configuration.  The host OS was Ubuntu 10.04 LTS and was using the default DRBD 
provided by that distribution.

The two servers were upgraded to Ubuntu 12.04 LTS last week which includes 
kernel 3.2.0-35 and DRBD 8.3.11.  The upgrade went smoothly and DRBD is using 
the same configuration files as I had created for 10.04 LTS.

The OS deadline scheduler is being used, not the default CFQ.

The iSCSI storage is configured with LVM to give three volume groups and ten 
logical volumes.  The storage is used for file server storage (a Windows 2008 
R2 server) and virtual guest storage for a Proxmox v2.2 KVM environment.  The 
two servers have a private, dedicated bonded (round-robin) dual port nic (Intel 
Pro/1000 PT) connection for DRBD.

My problem:
During prolonged writes (approx. 3-6 minutes) from a virtual guest restore 
initiated from the Proxmox virtual host, iowait (and subsequently load average) 
increases on the Proxmox and primary iSCSI/DRBD servers to a point where SCSI 
timeouts occur both for the Proxmox server and any virtual guests running at 
the time of the restore.

Problem details:
I needed to restore some Proxmox KVM virtual guests from backups to their 
original logical volumes on the iSCSI storage.  The process involves 
decompressing the virtual guest backup on the Proxmox server then using dd to 
copy the image to a logical volume created by Proxmox on the iSCSI storage lun.

During this restore (an image of approx. 40GB) the iowait on the primary iSCSI 
server was initially low (<1%) but after approx. 15-30sec it climbs to about 
75% (average over 8 cores) and stays there.  The load average also climbs and 
eventually the Proxmox host and virtual guests sharing the iSCSI storage 
started getting SCSI timeouts and locking up.

This behaviour is reproducible.

When not performing a restore from the Proxmox virtual host to the iSCSI/DRBD 
storage the system is performing very well.

Analysis of the problem:
I have spent some time investigating this to try and determine why iowait is so 
high in this scenario.  I have found the following.

        1. If the resource being used is connected to the resource on the 
secondary (normal primary-secondary DRBD config) then iowait climbs to approx . 
75% after approx. 15-30sec.  The write speed from the Proxmox host to the 
primary iSCSI/DRBD node is approx. 75Mbytes/sec and the replication bond link 
is running at approx. 650Mbits/sec

      2. If I disconnect the particular resource on the primary iSCSI/DRBD node 
then high iowait does not occur at all.  Write performance from the Proxmox 
host to the primary ISCSI is basically wire speed (110-120MBytes/sec)

        3. If I then reconnect the resource, synchronization starts as expected 
and runs successfully (syncer set at 150M) with the bond running at approx. 
1700Mbits/sec.

The results reveal that the DRBD layer is not causing much overhead when 
running in StandAlone mode.  However, when running in connected mode (Protocol 
C) something is going on which is causing high iowait.

In connected mode, even though the incoming network connection from the Proxmox 
server is approx. 920Mbits/sec the bonded network connection between the DRBD 
nodes is only running at approx 600Mbits/sec.  When it runs in sync mode 
(option 3 above) it runs at approx. 1800Mbits/sec.  

This may not actually be a DRBD problem but rather some other IO problem(s) or 
interaction(s) but I can't work out what at this point.

My hunch is the initial delay then increase in iowait is related to IO buffers 
filling somewhere in the OS/iSCSI/DRBD/network layers.

I am using a standard DBRD config and can supply details if required.  I have 
tried increasing max-buffers and max-epoch-size to 8000 and sndbuf-size to 0 
(autotune) but these have not made much, if any impact).  I wanted to try and 
keep the posting as short as I could.

I have come across a few references of similar behaviour on the net but have 
not come across the solution(s) which appear relevant in my situation.

Any comments and suggestions would be welcome.

Regards

Paul
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] High iowait on primary DRBD node with large sustained writes and replication enabled to secondary

Reply via email to