Re: [Gluster-users] self-heal failed

Daniel Taylor Wed, 09 Jan 2013 06:04:15 -0800

It seems to me that what you need to do is replace the failed brick, orsimply rebuild the filesystem and let gluster attempt to restore it froma *clean* filesystem.

I haven't seen anywhere that allows gluster to actually change thereplication count on a live cluster, which is what you seem to berequesting.


On 01/09/2013 07:57 AM, Liang Ma wrote:

Todd,

Thanks for your reply. But how can I take this brick offline? Sincethe gluster volume has replicate count 2, it won't allow me to removeone brick. Is there a command which can take one replicate brick offline?


Many thanks.

Liang

On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <[email protected]<mailto:[email protected]>> wrote:


    Liang,

    I don't claim to know the answer to your question, and my
    knowledge of zfs
    is minimal at best so I may be way off base here, but it seems to
    me that
    your attempted random corruption with this command:


      dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480

    is likely going to corrupt the underlying zfs filesystem metadata, not
    just file data, and I wouldn't expect gluster to be able to fixed a
    brick's corrupted filesystem.  Perhaps you now have to take the brick
    offline, fix any zfs filesystem errors if possible, bring the
    brick back
    online and see what then happens with self-heal.

    --
    Todd Pfaff <[email protected] <mailto:[email protected]>>
    http://www.rhpcs.mcmaster.ca/


    On Tue, 8 Jan 2013, Liang Ma wrote:

        Hi There,

        I'd like to test and understand the self heal feature of
        glusterfs. This is
        what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS.

        gluster volume create gtest replica 2 gluster3:/zfs-test
        gluster4:/zfs-test
        where zfs-test is a zfs pool on partition /dev/sda6 in both nodes.

        To simulate a random corruption on node gluster3

        dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480

        Now zfs detected the corrupted files

          pool: zfs-test
         state: ONLINE
        status: One or more devices has experienced an error resulting
        in data
                corruption.  Applications may be affected.
        action: Restore the file in question if possible.  Otherwise
        restore the
                entire pool from backup.
           see: http://zfsonlinux.org/msg/ZFS-8000-8A
         scan: none requested
        config:

                NAME        STATE     READ WRITE CKSUM
                zfs-test   ONLINE       0     0 2.29K
                  sda6     ONLINE       0     0 4.59K

        errors: Permanent errors have been detected in the following
        files:

                /zfs-test/<xattrdir>/trusted.gfid

/zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat

        trdir>/trusted.gfid

/zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat

        trdir>/trusted.gfid

        Now the gluster log file shows the self heal can't fix the
        corruption
        [2013-01-08 12:46:03.371214] W
        [afr-common.c:1196:afr_detect_self_heal_by_iatt]
        2-gtest-replicate-0:
        /K.iso: gfid different on subvolume
        [2013-01-08 12:46:03.373539] E
        [afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk]
        2-gtest-replicate-0:
        Missing Gfids for /K.iso
        [2013-01-08 12:46:03.385701] E
        [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
        2-gtest-replicate-0: background  gfid self-heal failed on /K.iso
        [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_cbk]
        0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available)

        where K.iso is one of the sample files affected by the dd command.

        So could anyone tell me what is the best way to repair the
        simulated
        corruption?

        Thank you.

        Liang





_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users


--
Daniel Taylor             VP Operations       Vocal Laboratories, Inc
[email protected]                                     612-235-5711

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self-heal failed

Reply via email to