Re: [Gluster-users] afr-self-heald.c:479:afr_shd_index_sweep

Ravishankar N Thu, 29 Jun 2017 01:37:35 -0700

On 06/29/2017 01:08 PM, Paolo Margara wrote:

Hi all,

for the upgrade I followed this procedure:

  * put node in maintenance mode (ensure no client are active)
  * yum versionlock delete glusterfs*
  * service glusterd stop
  * yum update
  * systemctl daemon-reload
  * service glusterd start
  * yum versionlock add glusterfs*
  * gluster volume heal vm-images-repo full
  * gluster volume heal vm-images-repo info
on each server every time I ran 'gluster --version' to confirm theupgrade, at the end I ran 'gluster volume set all cluster.op-version30800'.
Today I've tried to manually kill a brick process on a non criticalvolume, after that into the log I see:
[2017-06-29 07:03:50.074388] I [MSGID: 100030][glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd: Started running/usr/sbin/glusterfsd version 3.8.12 (args: /usr/sbin/glusterfsd -svirtnode-0-1-gluster --volfile-idiso-images-repo.virtnode-0-1-gluster.data-glusterfs-brick1b-iso-images-repo-p/var/lib/glusterd/vols/iso-images-repo/run/virtnode-0-1-gluster-data-glusterfs-brick1b-iso-images-repo.pid-S /var/run/gluster/c779852c21e2a91eaabbdda3b9127262.socket--brick-name /data/glusterfs/brick1b/iso-images-repo -l/var/log/glusterfs/bricks/data-glusterfs-brick1b-iso-images-repo.log--xlator-option*-posix.glusterd-uuid=e93ebee7-5d95-4100-a9df-4a3e60134b73--brick-port 49163 --xlator-optioniso-images-repo-server.listen-port=49163)
I've checked after the restart and indeed now the directory'entry-changes' is created, but why stopping the glusterd service hasnot stopped also the brick processes?

Just stopping,upgrading and restarting glusterd does not restart thebrick processes, You would need to kill all gluster processes on thenode before upgrading. After upgrading, when you restart glusterd, itwill automatically spawn the rest of the gluster processes on that node.

Now how can I recover from this issue? Restarting all brick processesis enough?

Yes, but ensure there are no pending heals like Pranith mentioned.https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.7/lists the steps for upgrade to 3.7 but the steps mentioned there aresimilar for any rolling upgrade.


-Ravi



Greetings,

    Paolo Margara


Il 28/06/2017 18:41, Pranith Kumar Karampuri ha scritto:

On Wed, Jun 28, 2017 at 9:45 PM, Ravishankar N<[email protected] <mailto:[email protected]>> wrote:


    On 06/28/2017 06:52 PM, Paolo Margara wrote:

        Hi list,

        yesterday I noted the following lines into the glustershd.log
        log file:

        [2017-06-28 11:53:05.000890] W [MSGID: 108034]
        [afr-self-heald.c:479:afr_shd_index_sweep]
        0-iso-images-repo-replicate-0: unable to get index-dir on
        iso-images-repo-client-0
        [2017-06-28 11:53:05.001146] W [MSGID: 108034]
        [afr-self-heald.c:479:afr_shd_index_sweep]
        0-vm-images-repo-replicate-0:
        unable to get index-dir on vm-images-repo-client-0
        [2017-06-28 11:53:06.001141] W [MSGID: 108034]
        [afr-self-heald.c:479:afr_shd_index_sweep]
        0-hosted-engine-replicate-0:
        unable to get index-dir on hosted-engine-client-0
        [2017-06-28 11:53:08.001094] W [MSGID: 108034]
        [afr-self-heald.c:479:afr_shd_index_sweep]
        0-vm-images-repo-replicate-2:
        unable to get index-dir on vm-images-repo-client-6
        [2017-06-28 11:53:08.001170] W [MSGID: 108034]
        [afr-self-heald.c:479:afr_shd_index_sweep]
        0-vm-images-repo-replicate-1:
        unable to get index-dir on vm-images-repo-client-3

        Digging into the mailing list archive I've found another user
        with a
        similar issue (the thread was '[Gluster-users] glustershd:
        unable to get
        index-dir on myvolume-client-0'), the solution suggested was
        to verify
        if the  /<path-to-backend-brick>/.glusterfs/indices directory
        contains
        all these sub directories: 'dirty', 'entry-changes' and
        'xattrop' and if
        some of them does not exists simply create it with mkdir.

        In my case the 'entry-changes' directory is not present on
        all the
        bricks and on all the servers:

        /data/glusterfs/brick1a/hosted-engine/.glusterfs/indices/:
        total 0
        drw------- 2 root root 55 Jun 28 15:02 dirty
        drw------- 2 root root 57 Jun 28 15:02 xattrop

        /data/glusterfs/brick1b/iso-images-repo/.glusterfs/indices/:
        total 0
        drw------- 2 root root 55 May 29 14:04 dirty
        drw------- 2 root root 57 May 29 14:04 xattrop

        /data/glusterfs/brick2/vm-images-repo/.glusterfs/indices/:
        total 0
        drw------- 2 root root 112 Jun 28 15:02 dirty
        drw------- 2 root root  66 Jun 28 15:02 xattrop

        /data/glusterfs/brick3/vm-images-repo/.glusterfs/indices/:
        total 0
        drw------- 2 root root 64 Jun 28 15:02 dirty
        drw------- 2 root root 66 Jun 28 15:02 xattrop

        /data/glusterfs/brick4/vm-images-repo/.glusterfs/indices/:
        total 0
        drw------- 2 root root 112 Jun 28 15:02 dirty
        drw------- 2 root root  66 Jun 28 15:02 xattrop

        I've recently upgraded gluster from 3.7.16 to 3.8.12 with the
        rolling
        upgrade procedure and I haven't noted this issue prior of the
        update, on
        another system upgraded with the same procedure I haven't
        encountered
        this problem.

        Currently all VM images appear to be OK but prior to create the
        'entry-changes' I would like to ask if this is still the correct
        procedure to fix this issue


    Did you restart the bricks after the upgrade? That should have
    created the entry-changes directory. Can you kill the brick and
    restart it and see if the dir is created? Double check from the
    brick logs that you're indeed running 3.12:  "Started running
    /usr/local/sbin/glusterfsd version 3.8.12" should appear when the
    brick starts.

Please note that if you are going the route of killing andrestarting, you need to do it in the same way you did rollingupgrade. You need to wait for heal to complete before you kill theother nodes. But before you do this, it is better you look at thelogs or confirm the steps you used for doing upgrade.



    -Ravi


          and if this problem could have affected the
        heal operations occurred meanwhile.

        Thanks.


        Greetings,

             Paolo Margara

        _______________________________________________
        Gluster-users mailing list
        [email protected] <mailto:[email protected]>
        http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>



    _______________________________________________
    Gluster-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.gluster.org/mailman/listinfo/gluster-users
    <http://lists.gluster.org/mailman/listinfo/gluster-users>




--
Pranith


--
LABINF - HPC@POLITO
DAUIN - Politecnico di Torino
Corso Castelfidardo, 34D - 10129 Torino (TO)
phone: +39 011 090 7051
site:http://www.labinf.polito.it/
site:http://hpc.polito.it/

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] afr-self-heald.c:479:afr_shd_index_sweep

Reply via email to