Re: [Gluster-users] self-heal failed

Todd Pfaff Wed, 09 Jan 2013 06:35:41 -0800

Liang,

I suppose my choice of words was misleading.  What I mean is:


- unmount the corrupted brick filesystem
- try to check and repair the brick filesystem
- if repair fails, re-create the filesystem
- remount the brick filesystem

but, as I said, I'm not very familiar with zfs.  Based on my quick glance
at some zfs documentation it sounds to me like online zfs check-and-repair
may be possible (this is oracle zfs documentation and I have no idea how
the linux zfs implementation compares):

  http://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html

but since you're a zfs user you likely already know much more about zfs
than I do.

Todd

On Wed, 9 Jan 2013, Liang Ma wrote:

Todd,

Thanks for your reply. But how can I take this brick offline? Since the
gluster volume has replicate count 2, it won't allow me to remove one brick.
Is there a command which can take one replicate brick offline?

Many thanks.

Liang


On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <[email protected]> wrote:
      Liang,

      I don't claim to know the answer to your question, and my
      knowledge of zfs
      is minimal at best so I may be way off base here, but it seems
      to me that
      your attempted random corruption with this command:

        dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480

is likely going to corrupt the underlying zfs filesystem metadata, not
just file data, and I wouldn't expect gluster to be able to fixed a
brick's corrupted filesystem.  Perhaps you now have to take the brick
offline, fix any zfs filesystem errors if possible, bring the brick
back
online and see what then happens with self-heal.

--
Todd Pfaff <[email protected]>
http://www.rhpcs.mcmaster.ca/

On Tue, 8 Jan 2013, Liang Ma wrote:

      Hi There,

      I'd like to test and understand the self heal feature of
      glusterfs. This is
      what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1
      LTS.

      gluster volume create gtest replica 2 gluster3:/zfs-test
      gluster4:/zfs-test
      where zfs-test is a zfs pool on partition /dev/sda6 in
      both nodes.

      To simulate a random corruption on node gluster3

      dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480

      Now zfs detected the corrupted files

        pool: zfs-test
       state: ONLINE
      status: One or more devices has experienced an error
      resulting in data
              corruption.  Applications may be affected.
      action: Restore the file in question if possible.
       Otherwise restore the
              entire pool from backup.
         see: http://zfsonlinux.org/msg/ZFS-8000-8A
       scan: none requested
      config:

              NAME        STATE     READ WRITE CKSUM
              zfs-test   ONLINE       0     0 2.29K
                sda6     ONLINE       0     0 4.59K

      errors: Permanent errors have been detected in the
      following files:

              /zfs-test/<xattrdir>/trusted.gfid
             
      /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46
             
      /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat
      trdir>/trusted.gfid
             
      /zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat
      trdir>/trusted.gfid

      Now the gluster log file shows the self heal can't fix the
      corruption
      [2013-01-08 12:46:03.371214] W
      [afr-common.c:1196:afr_detect_self_heal_by_iatt]
      2-gtest-replicate-0:
      /K.iso: gfid different on subvolume
      [2013-01-08 12:46:03.373539] E
      [afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk]
      2-gtest-replicate-0:
      Missing Gfids for /K.iso
      [2013-01-08 12:46:03.385701] E
      [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
      2-gtest-replicate-0: background  gfid self-heal failed on
      /K.iso
      [2013-01-08 12:46:03.385760] W
      [fuse-bridge.c:292:fuse_entry_cbk]
      0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data
      available)

      where K.iso is one of the sample files affected by the dd
      command.

      So could anyone tell me what is the best way to repair the
      simulated
      corruption?

      Thank you.

      Liang

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self-heal failed

Reply via email to