On 07/04/2018 09:20 PM, Anh Vo wrote:
I forgot to mention we're using 3.12.10

On Wed, Jul 4, 2018 at 8:45 AM, Anh Vo <vtq...@gmail.com <mailto:vtq...@gmail.com>> wrote:

    If I run "sudo gluster volume heal gv0 split-brain latest-mtime /"
    I get the following:

    Lookup failed on /:Invalid argument.
    Volume heal failed.


Can you share the glfsheal-<volname>.log on the node where you ran this failed command?


    node2 was not connected at that time, because if we connect it to
    the system after a few minutes gluster will become almost unusable
    and we have many jobs failing. This morning I reconnected it and
    ran heal info and we have about 30000 entries to heal (15K from
    gfs-vm000 and 15k from gfs-vm001, 80% are all gfid, 20% have file
    names). It's not feasible for us to check the individual gfid so
    we kinda rely on gluster self heal to handle those gfid. The "/"
    is a concern because it prevents us from mounting nfs. We do need
    to mount nfs for some of our management because gluster fuse mount
    is much slower compared to nfs when it comes to recursive
    operations like 'du'

    Do you have any suggestion for healing the metadata on '/' ?

You can manually delete the afr xattrs on node 3 as a workaround:
setfattr -x trusted.afr.gv0-client-0 gluster/brick/brick0
setfattr -x trusted.afr.gv0-client-1 gluster/brick/brick0

This should remove the split-brain on root.

HTH,
Ravi


    Thanks
    Anh

    On Tue, Jul 3, 2018 at 8:02 PM, Ravishankar N
    <ravishan...@redhat.com <mailto:ravishan...@redhat.com>> wrote:

        Hi,

        What version of gluster are you using?

        1. The afr xattrs on '/' indicate a meta-data split-brain. You
        can resolve it using one of the policies listed in
        https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
        
<https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/>

        For example, "|gluster volume heal gv0 split-brain
        latest-mtime / "
        |

        2. Is the file corresponding to the other gfid
        (81289110-867b-42ff-ba3b-1373a187032b) present in all bricks?
        What do the getfattr outputs for this file indicate?

        3. As for the discrepancy in output of heal info, is node2
        connected to the other nodes? Does heal info still print the
        details of all 3 bricks when you run it on node2 ?
        -Ravi


        On 07/04/2018 01:47 AM, Anh Vo wrote:
        Actually we just discovered that the heal info command was
        returning different things when executed on the different
        nodes of our 3-replica setup.
        When we execute it on node2 we did not see the split brain
        reported "/" but if I execute it on node0 and node1 I am seeing:

        x@gfs-vm001:~$ sudo gluster volume heal gv0 info | tee heal-info
        Brick gfs-vm000:/gluster/brick/brick0
        <gfid:81289110-867b-42ff-ba3b-1373a187032b>
        / - Is in split-brain

        Status: Connected
        Number of entries: 2

        Brick gfs-vm001:/gluster/brick/brick0
        / - Is in split-brain

        <gfid:81289110-867b-42ff-ba3b-1373a187032b>
        Status: Connected
        Number of entries: 2

        Brick gfs-vm002:/gluster/brick/brick0
        / - Is in split-brain

        Status: Connected
        Number of entries: 1


        I ran getfattr -d -m . -e hex /gluster/brick/brick0 on all
        three nodes and I am seeing node2 has slightly different attr:
        node0:
        sudo getfattr -d -m . -e hex /gluster/brick/brick0
        getfattr: Removing leading '/' from absolute path names
        # file: gluster/brick/brick0
        trusted.afr.gv0-client-2=0x000000000000000100000000
        trusted.gfid=0x00000000000000000000000000000001
        trusted.glusterfs.dht=0x000000010000000000000000ffffffff
        trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2

        node1:
        sudo getfattr -d -m . -e hex /gluster/brick/brick0
        getfattr: Removing leading '/' from absolute path names
        # file: gluster/brick/brick0
        trusted.afr.gv0-client-2=0x000000000000000100000000
        trusted.gfid=0x00000000000000000000000000000001
        trusted.glusterfs.dht=0x000000010000000000000000ffffffff
        trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2

        node2:
        sudo getfattr -d -m . -e hex /gluster/brick/brick0
        getfattr: Removing leading '/' from absolute path names
        # file: gluster/brick/brick0
        trusted.afr.dirty=0x000000000000000000000000
        trusted.afr.gv0-client-0=0x000000000000000200000000
        trusted.afr.gv0-client-1=0x000000000000000200000000
        trusted.afr.gv0-client-2=0x000000000000000000000000
        trusted.gfid=0x00000000000000000000000000000001
        trusted.glusterfs.dht=0x000000010000000000000000ffffffff
        trusted.glusterfs.volume-id=0x7fa3aac372d543f987ed0c66b77f02e2

        Where do I go from here? Thanks




_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to