Re: [Gluster-users] GlusterFS HA testing feedback

Joe Julian Tue, 22 Oct 2013 13:14:39 -0700


On 10/22/2013 02:42 AM, José A. Lausuch Sales wrote:

Hi,
we are currently evaluating GlusterFS for a production environment.Our focus is on the high-availability features of GlusterFS. However,our tests have not worked out well. Hence I am seeking feedback from you.
In our planned production environment, Gluster should provide sharedstorage for VM disk images. So, our very basic initial test setup isas follows:
We are using two servers, each providing a single brick of areplicated gluster volume (Gluster 3.4.1). A third server runs atest-VM (Ubuntu 13.04 on QEMU 1.3.0 and libvirt 1.0.3) which uses adisk image file stored on the gluster volume as block device(/dev/vdb). For testing purposes, the root file system of this VM(/dev/vda) is a disk image NOT stored on the gluster volume.
To test the high-availability features of gluster under load, we runFIO inside the VM directly on the vdb block device (see configurationbelow). Up to now, we tested reading only. The test procedure is asfollows:
1.We start FIO inside the VM and observe by means of "top" which ofthe two servers receives the read requests (i.e., increased CPU loadof the glusterd process). Let's say that Server1 has the CPU load byglusterfsd.
2.While FIO is running, we take down the network of this Server1 andobserve if the Server2 takes over.

You're bringing server1 down by taking down the NIC (assuming from #5).This does take down the connection but it does so without closing theTCP connection. Though this does represent worst-case scenarios, seehttp://joejulian.name/blog/keeping-your-vms-from-going-read-only-when-encountering-a-ping-timeout-in-glusterfs/

3.This "fail over" works (almost 100% of the times), we see the CPUload from glusterfsd on Server2. As expected, Server1 does not haveany load because is "offline".
4.After a while we bring up the NIC on Server1 again. In this step werealized that the expected behavior is that when bringing up this NIC,this server should take over again (something like active-passivebehavior) but this happens only 5-10% of the times. The CPU load isstill on Server2.

I'm not sure I would have that expectation. The second server will havetaken over the open FD and the reads should come from there. The readsfor a given fd come from the first-to-respond to the lookup().

5.After some time, we bring down the NIC on Server2 expecting thatServer1 takes over. This second "fail over" crashes. The VM complainsabout I/O errors which can only be resolved by restarting the VM andsometimes even removing and creating the volume again.
After some test, we realized that if restarting the glusterd daemon(/etc/init.d/glusterd restart) on Server1 after step 3 or before step4, the Server1 takes over automatically without bringing down Server2or anything like that.

Check the logs for glusterd(/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) for clues. Perhapsthe /way/ you're taking down the NIC is exposing some bug. Perhapsinstead of taking it down, use iptables or just killall glusterfsd.

We tested this using the normal FUSE mount and libgfapi. If usingFUSE, the local mount sometimes becomes unavailable (ls shows not morefiles) if the failover fails.
We have a few fundamental questions in this regard:
i) Is Gluster supposed to handle such a scenario or are we makingwrong assumptions? Because the only solution we found is to restartthe daemon when a network outage occurs, but this is not acceptable ina real scenario with VMs running real applications.

I host my (raw and qcow2) vm images on a gluster volume. Since myservers are not expected to hard-crash a lot, I take them down formaintenance (kernel updates and such) gracefully, killing the processesfirst. This closes the TCP connections and everything just keeps hummingalong.

ii) What is the recommended configuration in terms of caching (QEMU:cache=none/writethrough/writeback) and direct I/O (FIO and Gluster) tomaximize the reliability of the failover process? We varied theparameters but could find a working configuration. Do these parametershave an impact at all?

To the best of my knowledge, none of those should affect reliability.





FIO test specification:

[global]
direct=1
ioengine=libaio
iodepth=4
filename=/dev/vdb
runtime=300
numjobs=1

[maxthroughput]
rw=read
bs=16k



VM configuration:

<domain type='kvm' id='6'>
  <name>testvm</name>
<uuid>93877c03-605b-ed67-1ab2-2ba16b5fb6b5</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-1.1'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writethrough'/>
      <source dev='/mnt/local/io-perf.img'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x04'function='0x0'/>

    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='writethrough'/>
      <source dev='/mnt/shared/io-perf-testdisk.img'/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x07'function='0x0'/>

    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x01'function='0x2'/>

    </controller>
    <interface type='network'>
      <mac address='52:54:00:36:5f:dd'/>
      <source network='default'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x03'function='0x0'/>

    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x02'function='0x0'/>

    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x05'function='0x0'/>

    </memballoon>
  </devices>
  <seclabel type='none'/>
</domain>




Thank you very much in advance,
Jose Lausuch


_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS HA testing feedback

Reply via email to