On 11/07/2013 05:19 PM, Øystein Viggen wrote:
Hi,
I have a small test setup on Ubuntu 12.04, using the
3.4.1-ubuntu1~precise1 packages of glusterfs from the recommended PPA.
There are two gluster servers (replicate 2) and one client. Bricks are
16 GB xfs -i size=512 filesystems. All servers run on vmware.
I've been using the linux kernel source for some simple performance and
stability tests with many small files. When deleting the linux kernel
tree with rm -Rf while rebooting one glusterfs server, it seems that
some deletes are missed, or "recreated". Here's how it goes:
root@client:/mnt# rm -Rf linux-3.12
At this point, I run "shutdown -r now" on one server. The deletion
seems to keep running just fine, but just as the server comes back up, I
get something like this on the client:
rm: cannot remove `linux-3.12/arch/mips/ralink/dts': Directory not empty
After the rm has run to completion:
root@client:/mnt# find linux-3.12 -type f
linux-3.12/arch/mips/ralink/dts/Makefile
Sometimes it's more than one file, too. "gluster volume heal volname
info" shows no outstanding entries.
If I turn off one server before running rm, and turn it on during the rm
run, a similar thing happens, only it seems worse. In one test, I had
9220 files left after rm had finished.
If both servers are up during the rm run, all files are deleted as
expected every time.
What is happening here, and can I do something to avoid it?
It sounds like a split brain issue. Below mentioned commands will help
you to figure this out.
gluster v heal <volumeName> info split-brain
gluster v heal <volumeName> info heal-failed
If you see any split-brain , then it is a bug. We can check with
gluster-devel if it is fixed in the master branch or there is bug for
it in bugzilla.
I was hoping that in a replica 2 cluster, you could safely reboot one
server at a time (with sync-up time in between) to, say, apply OS
patches without taking the gluster volume offline.
Yup, this should work. But not sure if there is any bug in gluster which
is causing the issue for you. The work around would be to do
stop/kill all gluster service in one of the machine. make sure the
glusterd service does not automatically start at next boot ( one time
activity) . Apply patches to the os, reboot it, start the glusterd
service. Check the self heal process to do all the sync required. You
can repeat the steps for the other node once this node have all
consistent data.
I'm thankful for any help.
Øystein
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users