Re: [Gluster-users] Self-Heal Daemon not Running

Ravishankar N Wed, 25 Sep 2013 01:29:37 -0700

On 09/25/2013 01:06 PM, Andrew Lau wrote:

On Wed, Sep 25, 2013 at 2:28 PM, Ravishankar N <[email protected]<mailto:[email protected]>>wrote:


    On 09/25/2013 06:16 AM, Andrew Lau wrote:

    That's where I found the 200+ entries

    [ root@hv01 ]gluster volume heal STORAGE info split-brain
    Gathering Heal info on volume STORAGE has been successful

    Brick hv01:/data1
    Number of entries: 271
    at            path on brick

    2013-09-25 00:04:29 /6682d31f-39ce-4896-99ef-14e1c9682585/dom_md/ids
    2013-09-25 00:04:29
    
/6682d31f-39ce-4896-99ef-14e1c9682585/images/5599c7c7-0c25-459a-9d7d-80190a7c739b/0593d351-2ab1-49cd-a9b6-c94c897ebcc7
    2013-09-24 23:54:29 <gfid:9c83f7e4-6982-4477-816b-172e4e640566>
    2013-09-24 23:54:29 <gfid:91e98909-c217-417b-a3c1-4cf0f2356e14>
    <snip>

    Brick hv02:/data1
    Number of entries: 0

    When I run the same command on hv02, it will show the reverse
    (the other node having 0 entries).

    I remember last time having to delete these files individually on
    another split-brain case, but I was hoping there was a better
    solution than going through 200+ entries.

    While I haven't tried it out myself, Jeff Darcy has written a
    script
    (https://github.com/jdarcy/glusterfs/tree/heal-script/extras/heal_script)
    which helps in automating the process. He has detailed it's usage
    in his blog post
    http://hekafs.org/index.php/2012/06/healing-split-brain/

    Hope this helps.
    -Ravi


That didn't end up working, ImportError: No module named volfilter

Oh, you need to download all 4 python scripts in the heal_script folder.

But I didn't end up spending much time with it as the number ofentries magically reduced to 10, I removed the files and thesplit-brain info reports 0 entries. Still wondering why there'sdifferent file sizes on the two bricks.

    Cheers.


    On Wed, Sep 25, 2013 at 10:39 AM, Mohit Anchlia
    <[email protected] <mailto:[email protected]>> wrote:

        What's the output of
        |gluster volume heal $VOLUME info ||split||-brain|


        On Tue, Sep 24, 2013 at 5:33 PM, Andrew Lau
        <[email protected] <mailto:[email protected]>> wrote:

            Found the BZ
            https://bugzilla.redhat.com/show_bug.cgi?id=960190 - so I
            restarted one of the volumes and it seems to have
            restarted the all daemons again.

            Self heal started again, but I seem to have split-brain
            issues everywhere. There's over 100 different entries on
            each node, what's the best way to restore this now? Short
            of having to manually go through and delete 200+ files.
            It looks like a full split brain as the file sizes on the
            different nodes are out of balance by about 100GB or so.

            Any suggestions would be much appreciated!

            Cheers.

            On Tue, Sep 24, 2013 at 10:32 PM, Andrew Lau
            <[email protected] <mailto:[email protected]>> wrote:

                Hi,

                Right now, I have a 2x1 replica. Ever since I had to
                reinstall one of the gluster servers, there's been
                issues with split-brain. The self-heal daemon doesn't
                seem to be running on either of the nodes.

                To reinstall the gluster server (the original brick
                data was intact but the OS had to be reinstalled)
                - Reinstalled gluster
                - Copied over the old uuid from backup
                - gluster peer probe
                - gluster volume sync $othernode all
                - mount -t glusterfs localhost:STORAGE /mnt
                - find /mnt -noleaf -print0 | xargs --null stat
                >/dev/null 2>/var/log/glusterfs/mnt-selfheal.log

                I let it resync and it was working fine, atleast so I
                thought. I just came back a few days later to see
                there's a miss match in the brick volumes. One is
                50GB ahead of the other.

                # gluster volume heal STORAGE info
                Status: self-heal-daemon is not running on
                966456a1-b8a6-4ca8-9da7-d0eb96997cbe

                /var/log/gluster/glustershd.log doesn't seem to have
                any recent logs, only those from when the two
                original gluster servers were running.

                # gluster volume status

                Self-heal Daemon on localhostN/ANN/A

                Any suggestions would be much appreciated!

                Cheers
                Andrew.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self-Heal Daemon not Running

Reply via email to