Hello, I've found GlusterFS to be an interesting project. Not so much experience of it (although from similar usecases with DRBD+NFS setups) so I setup some testcase to try out failover and recovery.
For this I have a setup with two glusterfs servers (each is a VM) and one client (also a VM). I'm using GlusterFS 3.4 btw. The servers manages a gluster volume created as: gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick gs2:/export/vda1/brick gluster volume start testvol gluster volume set testvol network.ping-timeout 5 Then the client mounts this volume as: mount -t glusterfs gs1:/testvol /import/testvol Everything seems to work good in normal usecases, I can write/read to the volume, take servers down and up again etc. As a fault scenario, I'm testing a fault injection like this: 1. continuesly writing timestamps to a file on the volume from the client. It is automated in a smaller testscript like: [email protected]:~/glusterfs-test$ cat scripts/test-gfs-client.sh #!/bin/sh gfs=/import/testvol while true; do date +%s >> $gfs/timestamp.txt ts=`tail -1 $gfs/timestamp.txt` md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d" "` echo "Timestamp = $ts, md5sum = $md5sum" sleep 1 done [email protected]:~/glusterfs-test$ As can be seen, the client is a quite simple user of the glusterfs volume. Low datarate and single user for example. 2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate like a broken network 3. After a short while, the failed server is brought alive again (ifconfig eth0 up) Step 2 and 3 is also automated in a testscript like: [email protected]:~/glusterfs-test$ cat scripts/fault-injection.sh #!/bin/sh # fault injection script tailored for two glusterfs nodes named gs1 and gs2 if [ "$HOSTNAME" == "gs1" ]; then peer="gs2" else peer="gs1" fi inject_eth_fault() { echo "network down..." ifconfig eth0 down sleep 10 ifconfig eth0 up echo "... and network up again." } recover() { echo "recovering from fault..." service glusterd restart } while true; do sleep 60 if [ ! -f /tmp/nofault ]; then if ping -c 1 $peer; then inject_eth_fault recover fi fi done [email protected]:~/glusterfs-test$ I then see that: A. This goes well first time, one server leaves the cluster and the client hang for like 8 seconds before beeing able to write to the volume again. B. When the failed server comes back, I can check that from both servers they see each other and "gluster peer status" shows they believe the other is in connected state. C. When the failed server comes back, it is not automatically seeking active participation on syncing volume etc (the local storage timestamp file isn't updated). D. If I do restart of glusterd service (service glusterd restart) the failed node seems to get back like it was before. Not always though... The chance is higher if I have long time between fault injections (long = 60 sec or so, with a forced faulty state of 10 sec) With a period time of some minutes, I could have the cluster servicing the client OK for up to 8+ hours at least. Shortening the period, I'm easily down to like 10-15 minutes. E. Sooner or later I enter a state where the two servers seems to be up, seeing it's peer (gluster peer status) and such but none is serving the volume to the client. I've tried to "heal" the volume in different way but it doesn't help. Sometimes it is just that one of the timestamp copies in each of the servers is ahead which is simpler but sometimes both the timestamp files have added data at end that the other doesnt have. To the questions: * Is it so that from a design point of perspective, the choice in the glusterfs team is that one shouldn't rely soley on glusterfs daemons beeing able to recover from a faulty state? There is need for cluster manager services (like heartbeat for example) to be part? That would make experience C understandable and one could then take heartbeat or similar packages to start/stop services. * What would then be the recommended procedure to recover from a faulty glusterfs node? (so that experience D and E is not happening) * What is the expected failover timing (of course depending on config, but say with a give ping timeout etc)? and expected recovery timing (with similar dependency on config)? * What/how is glusterfs team testing to make sure that the failover, recovery/healing functionality etc works? Any opinion if the testcase is bad is of course also very welcome. Best regards, Per
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
