On 11/07/2013 05:19 PM, Øystein Viggen wrote:
Hi,

I have a small test setup on Ubuntu 12.04, using the
3.4.1-ubuntu1~precise1 packages of glusterfs from the recommended PPA.
There are two gluster servers (replicate 2) and one client.  Bricks are
16 GB xfs -i size=512 filesystems.  All servers run on vmware.

I've been using the linux kernel source for some simple performance and
stability tests with many small files.  When deleting the linux kernel
tree with rm -Rf while rebooting one glusterfs server, it seems that
some deletes are missed, or "recreated".  Here's how it goes:

root@client:/mnt# rm -Rf linux-3.12

At this point, I run "shutdown -r now" on one server.  The deletion
seems to keep running just fine, but just as the server comes back up, I
get something like this on the client:

rm: cannot remove `linux-3.12/arch/mips/ralink/dts': Directory not empty

After the rm has run to completion:

root@client:/mnt# find linux-3.12 -type f
linux-3.12/arch/mips/ralink/dts/Makefile

Sometimes it's more than one file, too.  "gluster volume heal volname
info" shows no outstanding entries.

If I turn off one server before running rm, and turn it on during the rm
run, a similar thing happens, only it seems worse.  In one test, I had
9220 files left after rm had finished.

If both servers are up during the rm run, all files are deleted as
expected every time.


What is happening here, and can I do something to avoid it?
It sounds like a split brain issue. Below mentioned commands will help you to figure this out.

 gluster v heal <volumeName> info split-brain
 gluster v heal <volumeName> info heal-failed

If you see any split-brain , then it is a bug. We can check with gluster-devel if it is fixed in the master branch or there is bug for it in bugzilla.


I was hoping that in a replica 2 cluster, you could safely reboot one
server at a time (with sync-up time in between) to, say, apply OS
patches without taking the gluster volume offline.

Yup, this should work. But not sure if there is any bug in gluster which is causing the issue for you. The work around would be to do

stop/kill all gluster service in one of the machine. make sure the glusterd service does not automatically start at next boot ( one time activity) . Apply patches to the os, reboot it, start the glusterd service. Check the self heal process to do all the sync required. You can repeat the steps for the other node once this node have all consistent data.
I'm thankful for any help.

Øystein
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to