Re: [Gluster-users] Does replace-brick migrate data?

Ravishankar N Thu, 30 May 2019 21:57:40 -0700


On 31/05/19 3:20 AM, Alan Orth wrote:

Dear Ravi,
I spent a bit of time inspecting the xattrs on some files anddirectories on a few bricks for this volume and it looks a bit messy.Even if I could make sense of it for a few and potentially heal themmanually, there are millions of files and directories in total sothat's definitely not a scalable solution. After a few missteps with`replace-brick ... commit force` in the last week—one of which on abrick that was dead/offline—as well as some premature `remove-brick`commands, I'm unsure how how to proceed and I'm getting demotivated.It's scary how quickly things get out of hand in distributed systems...

Hi Alan,

The one good thing about gluster is it that the data is always availabledirectly on the backed bricks even if your volume has inconsistencies atthe gluster level. So theoretically, if your cluster is FUBAR, you couldjust create a new volume and copy all data onto it via its mount fromthe old volume's bricks.

I had hoped that bringing the old brick back up would help, but by thetime I added it again a few days had passed and all the brick-id's hadchanged due to the replace/remove brick commands, not to mention thatthe trusted.afr.$volume-client-xx values were now probably pointing tothe wrong bricks (?).
Anyways, a few hours ago I started a full heal on the volume and I seethat there is a sustained 100MiB/sec of network traffic going from theold brick's host to the new one. The completed heals reported in thelogs look promising too:
Old brick host:
# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 281614 Completed data selfheal
     84 Completed entry selfheal
 299648 Completed metadata selfheal

New brick host:
# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 198256 Completed data selfheal
  16829 Completed entry selfheal
 229664 Completed metadata selfheal
So that's good I guess, though I have no idea how long it will take orif it will fix the "missing files" issue on the FUSE mount. I'veincreased cluster.shd-max-threads to 8 to hopefully speed up the healprocess.

The afr xattrs should not cause files to disappear from mount. If thexattr names do not match what each AFR subvol expects (for eg. in areplica 2 volume, trusted.afr.*-client-{0,1} for 1st subvol,client-{2,3} for 2nd subvol and so on - ) for its children then it won'theal the data, that is all. But in your case I see some inconsistencieslike one brick having the actual file (licenseserver.cfg) and the otherhaving a linkto file (the one with thedht.linkto xattr) /in the samereplica pair/.


I'd be happy for any advice or pointers,

Did you check if the .glusterfs hardlinks/symlinks exist and are inorder for all bricks?


-Ravi

On Wed, May 29, 2019 at 5:20 PM Alan Orth <[email protected]<mailto:[email protected]>> wrote:


    Dear Ravi,

    Thank you for the link to the blog post series—it is very
    informative and current! If I understand your blog post correctly
    then I think the answer to your previous question about pending
    AFRs is: no, there are no pending AFRs. I have identified one file
    that is a good test case to try to understand what happened after
    I issued the `gluster volume replace-brick ... commit force` a few
    days ago and then added the same original brick back to the volume
    later. This is the current state of the replica 2
    distribute/replicate volume:

    [root@wingu0 ~]# gluster volume info apps

    Volume Name: apps
    Type: Distributed-Replicate
    Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 3 x 2 = 6
    Transport-type: tcp
    Bricks:
    Brick1: wingu3:/mnt/gluster/apps
    Brick2: wingu4:/mnt/gluster/apps
    Brick3: wingu05:/data/glusterfs/sdb/apps
    Brick4: wingu06:/data/glusterfs/sdb/apps
    Brick5: wingu0:/mnt/gluster/apps
    Brick6: wingu05:/data/glusterfs/sdc/apps
    Options Reconfigured:
    diagnostics.client-log-level: DEBUG
    storage.health-check-interval: 10
    nfs.disable: on

    I checked the xattrs of one file that is missing from the volume's
    FUSE mount (though I can read it if I access its full path
    explicitly), but is present in several of the volume's bricks
    (some with full size, others empty):

    [root@wingu0 ~]# getfattr -d -m. -e hex
    /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg

    getfattr: Removing leading '/' from absolute path names # file:
    mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
    
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
    trusted.afr.apps-client-3=0x000000000000000000000000
    trusted.afr.apps-client-5=0x000000000000000000000000
    trusted.afr.dirty=0x000000000000000000000000
    trusted.bit-rot.version=0x0200000000000000585a396f00046e15
    trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd [root@wingu05 ~]#
    getfattr -d -m. -e hex
    /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
    getfattr: Removing leading '/' from absolute path names # file:
    data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
    
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
    trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
    
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
    trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
    [root@wingu05 ~]# getfattr -d -m. -e hex
    /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
    getfattr: Removing leading '/' from absolute path names # file:
    data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
    
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
    trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
    
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
    [root@wingu06 ~]# getfattr -d -m. -e hex
    /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
    getfattr: Removing leading '/' from absolute path names # file:
    data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
    
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
    trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
    
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
    trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200

    According to the trusted.afr.apps-client-xxxattrs this particular
    file should be on bricks with id "apps-client-3" and
    "apps-client-5". It took me a few hours to realize that the
    brick-id values are recorded in the volume's volfiles in
    /var/lib/glusterd/vols/apps/bricks. After comparing those brick-id
    values with a volfile backup from before the replace-brick, I
    realized that the files are simply on the wrong brick now as far
    as Gluster is concerned. This particular file is now on the brick
    for "apps-client-4". As an experiment I copied this one file to
    the two bricks listed in the xattrs and I was then able to see the
    file from the FUSE mount (yay!).

    Other than replacing the brick, removing it, and then adding the
    old brick on the original server back, there has been no change in
    the data this entire time. Can I change the brick IDs in the
    volfiles so they reflect where the data actually is? Or perhaps
    script something to reset all the xattrs on the files/directories
    to point to the correct bricks?

    Thank you for any help or pointers,

    On Wed, May 29, 2019 at 7:24 AM Ravishankar N
    <[email protected] <mailto:[email protected]>> wrote:


        On 29/05/19 9:50 AM, Ravishankar N wrote:



        On 29/05/19 3:59 AM, Alan Orth wrote:

        Dear Ravishankar,

        I'm not sure if Brick4 had pending AFRs because I don't know
        what that means and it's been a few days so I am not sure I
        would be able to find that information.

        When you find some time, have a look at a blog
        <http://wp.me/peiBB-6b> series I wrote about AFR- I've tried
        to explain what one needs to know to debug replication
        related issues in it.


        Made a typo error. The URL for the blog is https://wp.me/peiBB-6b

        -Ravi


        Anyways, after wasting a few days rsyncing the old brick to
        a new host I decided to just try to add the old brick back
        into the volume instead of bringing it up on the new host. I
        created a new brick directory on the old host, moved the old
        brick's contents into that new directory (minus the
        .glusterfs directory), added the new brick to the volume,
        and then did Vlad's find/stat trick¹ from the brick to the
        FUSE mount point.

        The interesting problem I have now is that some files don't
        appear in the FUSE mount's directory listings, but I can
        actually list them directly and even read them. What could
        cause that?

        Not sure, too many variables in the hacks that you did to
        take a guess. You can check if the contents of the .glusterfs
        folder are in order on the new brick (example hardlink for
        files and symlinks for directories are present etc.) .
        Regards,
        Ravi


        Thanks,

        ¹
        
https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html

        On Fri, May 24, 2019 at 4:59 PM Ravishankar N
        <[email protected] <mailto:[email protected]>> wrote:


            On 23/05/19 2:40 AM, Alan Orth wrote:

            Dear list,

            I seem to have gotten into a tricky situation. Today I
            brought up a shiny new server with new disk arrays and
            attempted to replace one brick of a replica 2
            distribute/replicate volume on an older server using
            the `replace-brick` command:

            # gluster volume replace-brick homes
            wingu0:/mnt/gluster/homes
            wingu06:/data/glusterfs/sdb/homes commit force

            The command was successful and I see the new brick in
            the output of `gluster volume info`. The problem is
            that Gluster doesn't seem to be migrating the data,


            `replace-brick` definitely must heal (not migrate) the
            data. In your case, data must have been healed from
            Brick-4 to the replaced Brick-3. Are there any errors in
            the self-heal daemon logs of Brick-4's node? Does
            Brick-4 have pending AFR xattrs blaming Brick-3? The doc
            is a bit out of date. replace-brick command internally
            does all the setfattr steps that are mentioned in the doc.

            -Ravi

            and now the original brick that I replaced is no longer
            part of the volume (and a few terabytes of data are
            just sitting on the old brick):

            # gluster volume info homes | grep -E "Brick[0-9]:"
            Brick1: wingu4:/mnt/gluster/homes
            Brick2: wingu3:/mnt/gluster/homes
            Brick3: wingu06:/data/glusterfs/sdb/homes
            Brick4: wingu05:/data/glusterfs/sdb/homes
            Brick5: wingu05:/data/glusterfs/sdc/homes
            Brick6: wingu06:/data/glusterfs/sdc/homes

            I see the Gluster docs have a more complicated
            procedure for replacing bricks that involves
            getfattr/setfattr¹. How can I tell Gluster about the
            old brick? I see that I have a backup of the old
            volfile thanks to yum's rpmsave function if that helps.

            We are using Gluster 5.6 on CentOS 7. Thank you for any
            advice you can give.

            ¹
            
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick

--Alan Orth

            [email protected] <mailto:[email protected]>
            https://picturingjordan.com
            https://englishbulgaria.net
            https://mjanja.ch
            "In heaven all the interesting people are missing."
            ―Friedrich Nietzsche

            _______________________________________________
            Gluster-users mailing list
            [email protected]  <mailto:[email protected]>
            https://lists.gluster.org/mailman/listinfo/gluster-users

--Alan Orth

        [email protected] <mailto:[email protected]>
        https://picturingjordan.com
        https://englishbulgaria.net
        https://mjanja.ch
        "In heaven all the interesting people are missing."
        ―Friedrich Nietzsche


        _______________________________________________
        Gluster-users mailing list
        [email protected]  <mailto:[email protected]>
        https://lists.gluster.org/mailman/listinfo/gluster-users

--Alan Orth

    [email protected] <mailto:[email protected]>
    https://picturingjordan.com
    https://englishbulgaria.net
    https://mjanja.ch
    "In heaven all the interesting people are missing." ―Friedrich
    Nietzsche



--
Alan Orth
[email protected] <mailto:[email protected]>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Does replace-brick migrate data?

Reply via email to