Liang, I suppose my choice of words was misleading. What I mean is:
- unmount the corrupted brick filesystem - try to check and repair the brick filesystem - if repair fails, re-create the filesystem - remount the brick filesystem but, as I said, I'm not very familiar with zfs. Based on my quick glance at some zfs documentation it sounds to me like online zfs check-and-repair may be possible (this is oracle zfs documentation and I have no idea how the linux zfs implementation compares): http://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html but since you're a zfs user you likely already know much more about zfs than I do. Todd On Wed, 9 Jan 2013, Liang Ma wrote:
Todd, Thanks for your reply. But how can I take this brick offline? Since the gluster volume has replicate count 2, it won't allow me to remove one brick. Is there a command which can take one replicate brick offline? Many thanks. Liang On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <[email protected]> wrote: Liang, I don't claim to know the answer to your question, and my knowledge of zfs is minimal at best so I may be way off base here, but it seems to me that your attempted random corruption with this command: dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480 is likely going to corrupt the underlying zfs filesystem metadata, not just file data, and I wouldn't expect gluster to be able to fixed a brick's corrupted filesystem. Perhaps you now have to take the brick offline, fix any zfs filesystem errors if possible, bring the brick back online and see what then happens with self-heal. -- Todd Pfaff <[email protected]> http://www.rhpcs.mcmaster.ca/ On Tue, 8 Jan 2013, Liang Ma wrote: Hi There, I'd like to test and understand the self heal feature of glusterfs. This is what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS. gluster volume create gtest replica 2 gluster3:/zfs-test gluster4:/zfs-test where zfs-test is a zfs pool on partition /dev/sda6 in both nodes. To simulate a random corruption on node gluster3 dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480 Now zfs detected the corrupted files pool: zfs-test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM zfs-test ONLINE 0 0 2.29K sda6 ONLINE 0 0 4.59K errors: Permanent errors have been detected in the following files: /zfs-test/<xattrdir>/trusted.gfid /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46 /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat trdir>/trusted.gfid /zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat trdir>/trusted.gfid Now the gluster log file shows the self heal can't fix the corruption [2013-01-08 12:46:03.371214] W [afr-common.c:1196:afr_detect_self_heal_by_iatt] 2-gtest-replicate-0: /K.iso: gfid different on subvolume [2013-01-08 12:46:03.373539] E [afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk] 2-gtest-replicate-0: Missing Gfids for /K.iso [2013-01-08 12:46:03.385701] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 2-gtest-replicate-0: background gfid self-heal failed on /K.iso [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_cbk] 0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available) where K.iso is one of the sample files affected by the dd command. So could anyone tell me what is the best way to repair the simulated corruption? Thank you. Liang
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
