On 09/25/2013 01:06 PM, Andrew Lau wrote:
On Wed, Sep 25, 2013 at 2:28 PM, Ravishankar N <[email protected] <mailto:[email protected]>>wrote:On 09/25/2013 06:16 AM, Andrew Lau wrote:That's where I found the 200+ entries [ root@hv01 ]gluster volume heal STORAGE info split-brain Gathering Heal info on volume STORAGE has been successful Brick hv01:/data1 Number of entries: 271 at path on brick 2013-09-25 00:04:29 /6682d31f-39ce-4896-99ef-14e1c9682585/dom_md/ids 2013-09-25 00:04:29 /6682d31f-39ce-4896-99ef-14e1c9682585/images/5599c7c7-0c25-459a-9d7d-80190a7c739b/0593d351-2ab1-49cd-a9b6-c94c897ebcc7 2013-09-24 23:54:29 <gfid:9c83f7e4-6982-4477-816b-172e4e640566> 2013-09-24 23:54:29 <gfid:91e98909-c217-417b-a3c1-4cf0f2356e14> <snip> Brick hv02:/data1 Number of entries: 0 When I run the same command on hv02, it will show the reverse (the other node having 0 entries). I remember last time having to delete these files individually on another split-brain case, but I was hoping there was a better solution than going through 200+ entries.While I haven't tried it out myself, Jeff Darcy has written a script (https://github.com/jdarcy/glusterfs/tree/heal-script/extras/heal_script) which helps in automating the process. He has detailed it's usage in his blog post http://hekafs.org/index.php/2012/06/healing-split-brain/ Hope this helps. -Ravi That didn't end up working, ImportError: No module named volfilter
Oh, you need to download all 4 python scripts in the heal_script folder.
But I didn't end up spending much time with it as the number of entries magically reduced to 10, I removed the files and the split-brain info reports 0 entries. Still wondering why there's different file sizes on the two bricks.Cheers. On Wed, Sep 25, 2013 at 10:39 AM, Mohit Anchlia <[email protected] <mailto:[email protected]>> wrote: What's the output of |gluster volume heal $VOLUME info ||split||-brain| On Tue, Sep 24, 2013 at 5:33 PM, Andrew Lau <[email protected] <mailto:[email protected]>> wrote: Found the BZ https://bugzilla.redhat.com/show_bug.cgi?id=960190 - so I restarted one of the volumes and it seems to have restarted the all daemons again. Self heal started again, but I seem to have split-brain issues everywhere. There's over 100 different entries on each node, what's the best way to restore this now? Short of having to manually go through and delete 200+ files. It looks like a full split brain as the file sizes on the different nodes are out of balance by about 100GB or so. Any suggestions would be much appreciated! Cheers. On Tue, Sep 24, 2013 at 10:32 PM, Andrew Lau <[email protected] <mailto:[email protected]>> wrote: Hi, Right now, I have a 2x1 replica. Ever since I had to reinstall one of the gluster servers, there's been issues with split-brain. The self-heal daemon doesn't seem to be running on either of the nodes. To reinstall the gluster server (the original brick data was intact but the OS had to be reinstalled) - Reinstalled gluster - Copied over the old uuid from backup - gluster peer probe - gluster volume sync $othernode all - mount -t glusterfs localhost:STORAGE /mnt - find /mnt -noleaf -print0 | xargs --null stat >/dev/null 2>/var/log/glusterfs/mnt-selfheal.log I let it resync and it was working fine, atleast so I thought. I just came back a few days later to see there's a miss match in the brick volumes. One is 50GB ahead of the other. # gluster volume heal STORAGE info Status: self-heal-daemon is not running on 966456a1-b8a6-4ca8-9da7-d0eb96997cbe /var/log/gluster/glustershd.log doesn't seem to have any recent logs, only those from when the two original gluster servers were running. # gluster volume status Self-heal Daemon on localhostN/ANN/A Any suggestions would be much appreciated! Cheers Andrew.
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
