Restarts will go through a shutdown process. As long as the network
isn't actively unconfigured before the final kill, the tcp connection
will be shutdown and there will be no wait.
On 12/28/17 20:19, Sam McLeod wrote:
Sure, if you never restart / autoscale anything and if your use case
isn't bothered with up to 42 seconds of downtime, for us - 42 seconds
is a really long time for something like a patient management system
to refuse file attachments from being uploaded etc...
We apply a strict patching policy for security and kernel updates, we
often also load balance between underlying physical hosts and if the
virtual hosts have lots of storage it can be quicker to let them
shutdown and start on another host.
So for us, gone are the old Unix days of caring about uptime, a huge
part of our measurement of success and risk reduction has become how
quickly we can not just deploy our software / web apps into production
but also how quickly our platform can be reformed, patched and
migrated as is effective.
So in reality, I'd probably rolling restart our three node gluster
clusters every few weeks or so depending on what patches have been
released etc...
--
Sam McLeod
https://smcleod.net
https://twitter.com/s_mcleod
On 29 Dec 2017, at 11:08 am, Joe Julian <[email protected]
<mailto:[email protected]>> wrote:
The reason for the long (42 second) ping-timeout is because
re-establishing fd's and locks can be a very expensive operation.
With an average MTBF of 45000 hours for a server, even just a replica
2 would result in a 42 second MTTR every 2.6 years, or 6 nines of uptime.
On December 27, 2017 3:17:01 AM PST, Omar Kohl
<[email protected] <mailto:[email protected]>> wrote:
Hi,
If you set it to 10 seconds, and a node goes down, you'll see
a 10 seconds freez in all I/O for the volume.
Exactly! ONLY 10 seconds instead of the default 42 seconds :-)
As I said before the problem with the 42 seconds is that a Windows Samba
Client will disconnect (and therefore interrupt any read/write operation) after
waiting for about 25 seconds. So 42 seconds is too high. In this case it would
therefore make more sense to reduce the ping-timeout, right?
Has anyone done any performance measurements on what the implications of a low
ping-timeout are? What are the costs of "triggering heals all the time"?
On a related note I found the extras/hook-scripts/start/post/S29CTDBsetup.sh
<http://s29ctdbsetup.sh/> script that mounts a CTDB (Samba) share and explicitly sets
the ping-timeout to 10 seconds. There is a comment saying: "Make sure ping-timeout is not
default for CTDB volume". Unfortunately there is no explanation in the script, in the
commit or in the Gerrit review history
(https://review.gluster.org/#/c/7569/,https://review.gluster.org/#/c/8007/) for WHY you make
sure ping-timeout is not default. Can anyone tell me the reason?
Kind regards,
Omar
-----Ursprüngliche Nachricht-----
Von:[email protected]
<mailto:[email protected]>
[mailto:[email protected]] Im Auftrag [email protected]
<mailto:[email protected]>
Gesendet: Dienstag, 26. Dezember 2017 22:05
An:[email protected] <mailto:[email protected]>
Betreff: Re: [Gluster-users] Exact purpose ofnetwork.ping
<http://network.ping/>-timeout
Hi,
It's just the delay for which a node can stop responding before being
marked as down.
Basically that's how long a node can go down before a heal becomes
necessary to bring it back.
If you set it to 10 seconds, and a node goes down, you'll see a 10 seconds
freez in all I/O for the volume. That's why you don't want it too high (having
a 2 minutes freez on I/O for example would be pretty bad, depending on what you
host), but you don't want it too low either (to avoid triggering heals all the
time).
You can configure it because it depends on what you host. You might be okay
with a few minutes freez to avoid a heal, or you might not care about heals at
all and prefer a very low value to avoid feezes.
The default value should work pretty well for most things though
On Tue, Dec 26, 2017 at 01:11:48PM +0000, Omar Kohl wrote:
Hi, I have a question regarding the "ping-timeout" option. I
have been researching its purpose for a few days and it is
not completely clear to me. Especially that it is apparently
strongly encouraged by the Gluster community not to change or
at least decrease this value! Assuming that I set
ping-timeout to 10 seconds (instead of the default 42) this
would mean that if I have a network outage of 11 seconds then
Gluster internally would have to re-allocate some resources
that it freed after the 10 seconds, correct? But apart from
that there are no negative implications, are there? For
instance if I'm copying files during the network outage then
those files will continue copying after those 11 seconds.
This means that the only purpose of ping-timeout is to save
those extra resources that are used by "short" network
outages. Is that correct? If I am confident that my network
will not have many 11 second outages and if they do occur I
am willing to incur those extra costs due to resource
allocation is there any reason not to set ping-timeout to 10
seconds? The problem I have with a long ping-timeout is that
the Windows Samba Client disconnects after 25 seconds. So if
one of the nodes of a Gluster cluster shuts down ungracefully
then the Samba Client disconnects and the file that was being
copied is incomplete on the server. These "costs" seem to be
much higher than the potential costs of those Gluster
resource re-allocations. But it is hard to estimate because
there is not clear documentation what exactly those Gluster
costs are. In general I would be very interested in a
comprehensive explanation of ping-timeout and the up- and
downsides of setting high or low values for it. Kinds
regards, Omar
------------------------------------------------------------------------
Gluster-users mailing list [email protected]
<mailto:[email protected]>
http://lists.gluster.org/mailman/listinfo/gluster-users
------------------------------------------------------------------------
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://lists.gluster.org/mailman/listinfo/gluster-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users