It seems to me that what you need to do is replace the failed brick, or
simply rebuild the filesystem and let gluster attempt to restore it from
a *clean* filesystem.
I haven't seen anywhere that allows gluster to actually change the
replication count on a live cluster, which is what you seem to be
requesting.
On 01/09/2013 07:57 AM, Liang Ma wrote:
Todd,
Thanks for your reply. But how can I take this brick offline? Since
the gluster volume has replicate count 2, it won't allow me to remove
one brick. Is there a command which can take one replicate brick offline?
Many thanks.
Liang
On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <[email protected]
<mailto:[email protected]>> wrote:
Liang,
I don't claim to know the answer to your question, and my
knowledge of zfs
is minimal at best so I may be way off base here, but it seems to
me that
your attempted random corruption with this command:
dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
is likely going to corrupt the underlying zfs filesystem metadata, not
just file data, and I wouldn't expect gluster to be able to fixed a
brick's corrupted filesystem. Perhaps you now have to take the brick
offline, fix any zfs filesystem errors if possible, bring the
brick back
online and see what then happens with self-heal.
--
Todd Pfaff <[email protected] <mailto:[email protected]>>
http://www.rhpcs.mcmaster.ca/
On Tue, 8 Jan 2013, Liang Ma wrote:
Hi There,
I'd like to test and understand the self heal feature of
glusterfs. This is
what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS.
gluster volume create gtest replica 2 gluster3:/zfs-test
gluster4:/zfs-test
where zfs-test is a zfs pool on partition /dev/sda6 in both nodes.
To simulate a random corruption on node gluster3
dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
Now zfs detected the corrupted files
pool: zfs-test
state: ONLINE
status: One or more devices has experienced an error resulting
in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise
restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zfs-test ONLINE 0 0 2.29K
sda6 ONLINE 0 0 4.59K
errors: Permanent errors have been detected in the following
files:
/zfs-test/<xattrdir>/trusted.gfid
/zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46
/zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat
trdir>/trusted.gfid
/zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat
trdir>/trusted.gfid
Now the gluster log file shows the self heal can't fix the
corruption
[2013-01-08 12:46:03.371214] W
[afr-common.c:1196:afr_detect_self_heal_by_iatt]
2-gtest-replicate-0:
/K.iso: gfid different on subvolume
[2013-01-08 12:46:03.373539] E
[afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk]
2-gtest-replicate-0:
Missing Gfids for /K.iso
[2013-01-08 12:46:03.385701] E
[afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
2-gtest-replicate-0: background gfid self-heal failed on /K.iso
[2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_cbk]
0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available)
where K.iso is one of the sample files affected by the dd command.
So could anyone tell me what is the best way to repair the
simulated
corruption?
Thank you.
Liang
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Daniel Taylor VP Operations Vocal Laboratories, Inc
[email protected] 612-235-5711
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users