Re: [Gluster-devel] Query on healing process

Ravishankar N Thu, 03 Mar 2016 02:41:15 -0800

Hi,

On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote:

Hi Ravi,
As I discussed earlier this issue, I investigated this issue and findthat healing is not triggered because the "gluster volume healc_glusterfs info split-brain" command not showing any entries as aoutcome of this command even though the file in split brain case.


Couple of observations from the 'commands_output' file.

getfattr -d -m . -e hexopt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml

The afr xattrs do not indicate that the file is in split brain:

# file:opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml

trusted.afr.c_glusterfs-client-1=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae

getfattr -d -m . -e hexopt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml

trusted.afr.c_glusterfs-client-0=0x000000080000000000000000
trusted.afr.c_glusterfs-client-2=0x000000020000000000000000
trusted.afr.c_glusterfs-client-4=0x000000020000000000000000
trusted.afr.c_glusterfs-client-6=0x000000020000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae

1. There doesn't seem to be a split-brain going by the trusted.afr* xattrs.

2. You seem to have re-used the bricks from another volume/setup. Forreplica 2, only trusted.afr.c_glusterfs-client-0 andtrusted.afr.c_glusterfs-client-1 must be present but I see 4 xattrs -client-0,2,4 and 63. On the rebooted node, do you have ssl enabled by any chance? There isa bug for "Not able to fetch volfile' when ssl is enabled:https://bugzilla.redhat.com/show_bug.cgi?id=1258931

Btw, you for data and metadata split-brains you can use the gluster CLIhttps://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.mdinstead of modifying the file from the back end.


-Ravi

So, what I have done I manually deleted the gfid entry of that filefrom .glusterfs directory and follow the instruction mentioned in thefollowing link to do heal


https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md

and this works fine for me.

But my question is why the split-brain command not showing any file inoutput.

Here I am attaching all the log which I get from the node for you andalso the output of commands from both of the boards


In this tar file two directories are present

000300 - log for the board which is running continuously
002500-  log for the board which is rebooted

I am waiting for your reply please help me out on this issue.

Thanks in advanced.

Regards,
Abhishek

On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL<[email protected] <mailto:[email protected]>> wrote:


    On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N
    <[email protected] <mailto:[email protected]>> wrote:

        On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote:


        Yes correct


        Okay, so when you say the files are not in sync until some
        time, are you getting stale data when accessing from the mount?
        I'm not able to figure out why heal info shows zero when the
        files are not in sync, despite all IO happening from the
        mounts. Could you provide the output of getfattr -d -m . -e
        hex /brick/file-name from both bricks when you hit this issue?

        I'll provide the logs once I get. here delay means we are
        powering on the second board after the 10 minutes.

        On Feb 26, 2016 9:57 AM, "Ravishankar N"
        <[email protected] <mailto:[email protected]>> wrote:

            Hello,

            On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:

            Hi Ravi,

            Thanks for the response.

            We are using Glugsterfs-3.7.8

            Here is the use case:

            We have a logging file which saves logs of the events
            for every board of a node and these files are in sync
            using glusterfs. System in replica 2 mode it means When
            one brick in a replicated volume goes offline, the
            glusterd daemons on the other nodes keep track of all
            the files that are not replicated to the offline brick.
            When the offline brick becomes available again, the
            cluster initiates a healing process, replicating the
            updated files to that brick. But in our casse, we see
            that log file of one board is not in the sync and its
            format is corrupted means files are not in sync.


            Just to understand you correctly, you have mounted the 2
            node replica-2 volume on both these nodes and writing to
            a logging file from the mounts right?


            Even the outcome of #gluster volume heal c_glusterfs
            info shows that there is no pending heals.

            Also , The logging file which is updated is of fixed
            size and the new entries will be wrapped ,overwriting
            the old entries.

            This way we have seen that after few restarts , the
            contents of the same file on two bricks are different ,
            but the volume heal info shows zero entries

            Solution:

            But when we tried to put delay > 5 min before the
            healing everything is working fine.

            Regards,
            Abhishek

            On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N
            <[email protected] <mailto:[email protected]>>
            wrote:

                On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote:

                Hi,

                Here, I have one query regarding the time taken by
                the healing process.
                In current two node setup when we rebooted one node
                then the self-healing process starts less than 5min
                interval on the board which resulting the
                corruption of the some files data.


                Heal should start immediately after the brick
                process comes up. What version of gluster are you
                using? What do you mean by corruption of data? Also,
                how did you observe that the heal started after 5
                minutes?
                -Ravi


                And to resolve it I have search on google and found
                the following link:
                https://support.rackspace.com/how-to/glusterfs-troubleshooting/

                Mentioning that the healing process can takes upto
                10min of time to start this process.

                Here is the statement from the link:

                "Healing replicated volumes

                When any brick in a replicated volume goes offline,
                the glusterd daemons on the remaining nodes keep
                track of all the files that are not replicated to
                the offline brick. When the offline brick becomes
                available again, the cluster initiates a healing
                process, replicating the updated files to that
                brick. *The start of this process can take up to 10
                minutes, based on observation.*"

                After giving the time of more than 5 min file
                corruption problem has been resolved.

                So, Here my question is there any way through which
                we can reduce the time taken by the healing process
                to start?


                Regards,
                Abhishek Paliwal




                _______________________________________________
                Gluster-devel mailing list
                [email protected]
                <mailto:[email protected]>
                http://www.gluster.org/mailman/listinfo/gluster-devel

--




            Regards
            Abhishek Paliwal

--




    Regards
    Abhishek Paliwal




--




Regards
Abhishek Paliwal

_______________________________________________
Gluster-devel mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Query on healing process

Reply via email to