Self heal happens whenever a lookup happens on an in-consistent file. The
commands ls -laR, find do lookup on all the files recursively under the
directory we specify.
Let's take an example:
- replica 2 cluster (2 peers) with 500K files
- during weekend the peer we call '1' disconnects for a short time (say 30
minutes) and when connection comes up again, about 10K files where modified
or created.
- on monday the Administrator hasn't any knowledge of the network glitch
(let's suppose he didn't implement any sort of network logging system)
- after 3 days, 1K of the 10K files modified during network glitch are still
unaccessed; in the afternoon the peer 2 hard crashes due to a total hardware
failure (MB replace needed)
Now we have 1K files unaccessible or obsolete!
I think that when a peer comes back, self-healing should start
automatically.
Of course we could write a shell script that tests network and issues an
'ls -laR' command when needed, but this is a sort of dirty solution.
Raf
Pranith.
----- Original Message -----
From: "Mohit Anchlia" <[email protected]>
To: "Pranith Kumar. Karampuri" <[email protected]>,
[email protected]
Sent: Wednesday, March 16, 2011 3:19:13 AM
Subject: Re: [Gluster-users] Best practices after a peer failure?
I thought self healing is possible only after we run "ls -alR or find
.." . It looks self healing is supposed to work when a dead node is
brought up, is that true?
On Tue, Mar 15, 2011 at 6:07 AM, Pranith Kumar. Karampuri
<[email protected]> wrote:
hi R.C.,
Could you please give the exact steps when you log the bug. Please also
give the output of gluster peer status on both the machines after
restart. zip the files under /usr/local/var/log/glusterfs/ and
/etc/glusterd on both the machines when this issue happens. This should
help us debug the issue.
Thanks
Pranith.
----- Original Message -----
From: "R.C." <[email protected]>
To: [email protected]
Sent: Tuesday, March 15, 2011 4:14:24 PM
Subject: Re: [Gluster-users] Best practices after a peer failure?
I've figured out the problem.
If you mount the glusterfs with native client on a peer, if another peer
crashes then doesn't self-heal after reboot.
Should I put this issue in the bug tracker?
Bye
Raf
----- Original Message -----
From: "R.C." <[email protected]>
To: <[email protected]>
Sent: Monday, March 14, 2011 11:41 PM
Subject: Best practices after a peer failure?
Hello to the list.
I'm practicing GlusterFS in various topologies by means of multiple
Virtualbox VMs.
As the standard system administrator, I'm mainly interested in disaster
recovery scenarios. The first being a replica 2 configuration, with one
peer crashing (actually stopping VM abruptly) during data writing to the
volume.
After rebooting the stopped VM and relaunching the gluster deamon
(service
glusterd start), the cluster doesn't start healing by itself.
I've also tried the suggested commands:
find <gluster-mount> -print0 | xargs --null stat >/dev/null
and
find <gluster-mount> -type f -exec dd if='{}' of=/dev/null bs=1M \; >
/dev/null 2>&1
without success.
A rebalance command recreates replicas but, when accessing cluster, the
always-alive client is the only one committing data to disk.
Where am I misoperating?
Thank you for your support.
Raf
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users