Firstly I want to say thank you to the developers and maintainers of DRBD for a
great application. I have been using it in production for a couple of years
and it has worked extremely well.
This is a lengthy post as I wanted to give a reasonable amount of detail
describing my system and the problem. I trust this is OK and you will persist
with reading:-)
Some background on my system:
I have an iSCSI storage system comprising two identical servers running a 3TB
RAID10 (6 x 1TB enterprise grade SATA II discs with 3Ware 9690SA controller and
BBU) which were running DRBD (v 8.3.7) over LVM in a primary-secondary
configuration. The host OS was Ubuntu 10.04 LTS and was using the default DRBD
provided by that distribution.
The two servers were upgraded to Ubuntu 12.04 LTS last week which includes
kernel 3.2.0-35 and DRBD 8.3.11. The upgrade went smoothly and DRBD is using
the same configuration files as I had created for 10.04 LTS.
The OS deadline scheduler is being used, not the default CFQ.
The iSCSI storage is configured with LVM to give three volume groups and ten
logical volumes. The storage is used for file server storage (a Windows 2008
R2 server) and virtual guest storage for a Proxmox v2.2 KVM environment. The
two servers have a private, dedicated bonded (round-robin) dual port nic (Intel
Pro/1000 PT) connection for DRBD.
My problem:
During prolonged writes (approx. 3-6 minutes) from a virtual guest restore
initiated from the Proxmox virtual host, iowait (and subsequently load average)
increases on the Proxmox and primary iSCSI/DRBD servers to a point where SCSI
timeouts occur both for the Proxmox server and any virtual guests running at
the time of the restore.
Problem details:
I needed to restore some Proxmox KVM virtual guests from backups to their
original logical volumes on the iSCSI storage. The process involves
decompressing the virtual guest backup on the Proxmox server then using dd to
copy the image to a logical volume created by Proxmox on the iSCSI storage lun.
During this restore (an image of approx. 40GB) the iowait on the primary iSCSI
server was initially low (<1%) but after approx. 15-30sec it climbs to about
75% (average over 8 cores) and stays there. The load average also climbs and
eventually the Proxmox host and virtual guests sharing the iSCSI storage
started getting SCSI timeouts and locking up.
This behaviour is reproducible.
When not performing a restore from the Proxmox virtual host to the iSCSI/DRBD
storage the system is performing very well.
Analysis of the problem:
I have spent some time investigating this to try and determine why iowait is so
high in this scenario. I have found the following.
1. If the resource being used is connected to the resource on the
secondary (normal primary-secondary DRBD config) then iowait climbs to approx .
75% after approx. 15-30sec. The write speed from the Proxmox host to the
primary iSCSI/DRBD node is approx. 75Mbytes/sec and the replication bond link
is running at approx. 650Mbits/sec
2. If I disconnect the particular resource on the primary iSCSI/DRBD node
then high iowait does not occur at all. Write performance from the Proxmox
host to the primary ISCSI is basically wire speed (110-120MBytes/sec)
3. If I then reconnect the resource, synchronization starts as expected
and runs successfully (syncer set at 150M) with the bond running at approx.
1700Mbits/sec.
The results reveal that the DRBD layer is not causing much overhead when
running in StandAlone mode. However, when running in connected mode (Protocol
C) something is going on which is causing high iowait.
In connected mode, even though the incoming network connection from the Proxmox
server is approx. 920Mbits/sec the bonded network connection between the DRBD
nodes is only running at approx 600Mbits/sec. When it runs in sync mode
(option 3 above) it runs at approx. 1800Mbits/sec.
This may not actually be a DRBD problem but rather some other IO problem(s) or
interaction(s) but I can't work out what at this point.
My hunch is the initial delay then increase in iowait is related to IO buffers
filling somewhere in the OS/iSCSI/DRBD/network layers.
I am using a standard DBRD config and can supply details if required. I have
tried increasing max-buffers and max-epoch-size to 8000 and sndbuf-size to 0
(autotune) but these have not made much, if any impact). I wanted to try and
keep the posting as short as I could.
I have come across a few references of similar behaviour on the net but have
not come across the solution(s) which appear relevant in my situation.
Any comments and suggestions would be welcome.
Regards
Paul
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user