Re: [DRBD-user] flashcache + drbd + LVM2 + GFS2 + KVM live migration -> data corruption

Maurits van de Lande Thu, 23 Feb 2012 03:15:30 -0800

>You're having us rely on crystal balls at this time. Feed us more information 
>and maybe someone will be able to help out.
Okay, thanks for your reply. I didn't know how detailed the post should have 
been. I'm sorry for the inconvenience.
If you need more information please let me know!
I have a complete installation/configuration guide with all the steps I used to 
configure the servers, but it's currently written in Dutch.


Problem:
When using flachcache + drbd in the configuration mentioned below I get a 
corrupted KVM Virtual Machine Image file after using virsh live migration. This 
is noticed after repeatedly live migrating the VM between two host servers. The 
VM host OS is Windows Server 2008 R2. Without flashcache I get no corrupted VM 
image file (tested 2000+ live migrations on the same VM)
I'm also using the latest virtio-win drivers. 

Host OS = Centos 6.2 x86_64
The migration is started using the following command
        # virsh migrate --live --verbose test 
qemu+ssh://vmhost2a.vdl-fittings.local/system
And back again with after 5 minutes
        # virsh migrate --live --verbose test 
qemu+ssh://vmhost2b.vdl-fittings.local/system
(I used cron to automate the task so the VM is live migrated every 5 minutes)

Symptom:
The VM Image corruption was noticed because windows began complaining about 
unreadable/corrupted system files. Also after rebooting the VM numerous errors 
were detected by windows disk check. It looks like the cached data is not the 
same on both servers. I have configured drbd to use the flashcache device so 
all drbd data should pass through the cache. Also static files became 
unreadable after a while. (when I tried to open a folder with some files, the 
folder was corrupted)

Configuration:
The two host servers are using GFS2 as the cluster storage file system to host 
the Image files. (which works fine without flashcache)

I have the following disk setup:
/dev/sda1                       raid1 SSD array (200GB) (using an Adaptec 6805 
controller)
/dev/sdc1                       raid5 HD array  (1.5TB) (using another Adaptec 
6805 controller)
/dev/mapper/cachedev    the flashcache device

As for flashcache, I have used two setups
- 1 using flashcache in write through mode
        # /sbin/flashcache_create -p thru -b 16k cachedev /dev/sda1 /dev/sdc1
- 2 using flashcache in write back mode
        # /sbin/flashcache_create -p back -b 16k cachedev /dev/sda1 /dev/sdc1
Both experienced the same VM Image corruption.

The disk /dev/drbd0 is mounted on /VM

My drbd setup is as follows: (using drbd8.3.12)
#------------------------------------------------------------------------
# Just for testing I kept all configuration in one file
#include "drbd.d/global_common.conf";
#include "drbd.d/*.res";
global {
    minor-count 64;
    usage-count yes;
}

common {
  syncer { 
  rate 110M;
  verify-alg sha1;
  csums-alg sha1;
  al-extents 3733;
#  cpu-mask 3;
  }
}

resource VMstore1 {

  protocol C;

  startup {
    wfc-timeout  1800; # 30 min
    degr-wfc-timeout 120;    # 2 minutes.
    wait-after-sb;
    become-primary-on both;
  }

  disk {
   no-disk-barrier;
#   no-disk-flushes;
  }

  net {
    max-buffers 8000;
    max-epoch-size 8000;
    sndbuf-size 0;
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  syncer{
    cpu-mask 3;
  }


 on vmhost2a.vdl-fittings.local {
    device     /dev/drbd0;
    disk       /dev/mapper/cachedev;
    address    192.168.100.3:7788;
    meta-disk  internal;
  }
 on vmhost2b.vdl-fittings.local {
    device    /dev/drbd0;
    disk      /dev/mapper/cachedev;
    address   192.168.100.4:7788;
    meta-disk internal;
  }
}
#------------------------------------------------------------------------

Cluster configuration: (no fence devices)
#------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster config_version="8" name="VMhost2">
        <cman expected_votes="1" two_node="1"/>
        <clusternodes>
                <clusternode name="vmhost2a.vdl-fittings.local" nodeid="1" 
votes="1"/>
                <clusternode name="vmhost2b.vdl-fittings.local" nodeid="2" 
votes="1"/>
        </clusternodes>
        <fencedevices/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="KVM" nofailback="1" ordered="0" 
restricted="1">
                                <failoverdomainnode 
name="vmhost2a.vdl-fittings.local" priority="1"/>
                                <failoverdomainnode 
name="vmhost2b.vdl-fittings.local" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <vm autostart="0" domain="KVM" exclusive="0" 
max_restarts="0" name="test" path="/VM/KVM/test/" recovery="disable" 
restart_expire_time="0"/>
                </resources>
        </rm>
</cluster>
#------------------------------------------------------------------------
Flashcache version
#------------------------------------------------------------------------ 
[root@vmhost2a ~]# modinfo flashcache
filename:       
/lib/modules/2.6.32-220.4.2.el6.x86_64/weak-updates/flashcache/flashcache.ko
license:        GPL
author:         Mohan - based on code by Ming
description:    device-mapper Facebook flash cache target
srcversion:     E1A5D9AA620A2EACC9FA891
depends:        dm-mod
vermagic:       2.6.32-220.el6.x86_64 SMP mod_unload modversions
#------------------------------------------------------------------------

| Van de Lande BV. | Lissenveld 1 | 4941VK | Raamsdonksveer | the Netherlands 
|T +31 (0) 162 516000 | F +31 (0) 162 521417 | www.vdl-fittings.com |

-----Oorspronkelijk bericht-----
Van: [email protected] 
[mailto:[email protected]] Namens Florian Haas
Verzonden: woensdag 22 februari 2012 21:40
Aan: drbd-user
Onderwerp: Re: [DRBD-user] flashcache + drbd + LVM2 + GFS2 + KVM live migration 
-> data corruption

Maurits,

Just reposting here isn't going to help. Neither in your original post nor in 
the bug report you reference did you mention anything other than that you got a 
"corrupted" VM. In what way? What was your _complete_ DRBD configuration? (You 
gave us only an obviously incomplete snippet). What was your VM configuration? 
How did you initiate the live migration? What kind of corruption did you see?

You're having us rely on crystal balls at this time. Feed us more information 
and maybe someone will be able to help out.

Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user


_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] flashcache + drbd + LVM2 + GFS2 + KVM live migration -> data corruption

Reply via email to