Re: [Gluster-users] Best practices after a peer failure?

R.C. Wed, 16 Mar 2011 04:00:53 -0700

Self heal happens whenever a lookup happens on an in-consistent file. Thecommands ls -laR, find do lookup on all the files recursively under thedirectory we specify.


Let's take an example:
- replica 2 cluster (2 peers) with 500K files

- during weekend the peer we call '1' disconnects for a short time (say 30minutes) and when connection comes up again, about 10K files where modifiedor created.- on monday the Administrator hasn't any knowledge of the network glitch(let's suppose he didn't implement any sort of network logging system)- after 3 days, 1K of the 10K files modified during network glitch are stillunaccessed; in the afternoon the peer 2 hard crashes due to a total hardwarefailure (MB replace needed)

Now we have 1K files unaccessible or obsolete!

I think that when a peer comes back, self-healing should startautomatically.Of course we could write a shell script that tests network and issues an'ls -laR' command when needed, but this is a sort of dirty solution.

Raf

Pranith.

----- Original Message -----
From: "Mohit Anchlia" <[email protected]>

To: "Pranith Kumar. Karampuri" <[email protected]>,[email protected]

Sent: Wednesday, March 16, 2011 3:19:13 AM
Subject: Re: [Gluster-users] Best practices after a peer failure?

I thought self healing is possible only after we run "ls -alR or find
.." . It looks self healing is supposed to work when a dead node is
brought up, is that true?

On Tue, Mar 15, 2011 at 6:07 AM, Pranith Kumar. Karampuri
<[email protected]> wrote:

hi R.C.,

Could you please give the exact steps when you log the bug. Please alsogive the output of gluster peer status on both the machines afterrestart. zip the files under /usr/local/var/log/glusterfs/ and/etc/glusterd on both the machines when this issue happens. This shouldhelp us debug the issue.

Thanks
Pranith.

----- Original Message -----
From: "R.C." <[email protected]>
To: [email protected]
Sent: Tuesday, March 15, 2011 4:14:24 PM
Subject: Re: [Gluster-users] Best practices after a peer failure?

I've figured out the problem.

If you mount the glusterfs with native client on a peer, if another peer
crashes then doesn't self-heal after reboot.

Should I put this issue in the bug tracker?

Bye

Raf

----- Original Message -----
From: "R.C." <[email protected]>
To: <[email protected]>
Sent: Monday, March 14, 2011 11:41 PM
Subject: Best practices after a peer failure?

Hello to the list.

I'm practicing GlusterFS in various topologies by means of multiple
Virtualbox VMs.

As the standard system administrator, I'm mainly interested in disaster
recovery scenarios. The first being a replica 2 configuration, with one
peer crashing (actually stopping VM abruptly) during data writing to the
volume.

After rebooting the stopped VM and relaunching the gluster deamon(service

glusterd start), the cluster doesn't start healing by itself.
I've also tried the suggested commands:
find <gluster-mount> -print0 | xargs --null stat >/dev/null
and
find <gluster-mount> -type f -exec dd if='{}' of=/dev/null bs=1M \; >
/dev/null 2>&1
without success.
A rebalance command recreates replicas but, when accessing cluster, the
always-alive client is the only one committing data to disk.

Where am I misoperating?

Thank you for your support.

Raf


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Best practices after a peer failure?

Reply via email to