Re: [Gluster-users] Exact purpose of network.ping-timeout

Joe Julian Thu, 11 Jan 2018 10:37:05 -0800

Answers inline.

On 12/29/2017 01:10 AM, Omar Kohl wrote:

Hi,


I know that "glusterbot" text about ping-timeout almost by heart by now ;-) I 
have searched the complete IRC logs and Mailing list from the last 4 or 5 years for 
anything related to ping-timeout.

I have to laugh, because I'm the one that wrote that text and I have touse glusterbot to remember it. :D


The problem with "can be a very expensive operation" is that this is extremely 
vague. It would be helpful to put some numbers behind it. Of course I also understand 
that any numbers would be very case specific and would not necessarily generalize to 
other use cases. So anyway...

It is vague, I agree. When I first started using gluster it was huge. Itwould take over half an hour of high server load and slow fops (30% ofnormal) after a ping-timeout for our servers with about 5000 open FDs.It would be so slow that the ping packets would get delayed forsometimes 30 seconds. If the ping-timeout had been shorter than that,this would have caused the whole process to start all over again.

I haven't had the time or the use case to look at whether or not that'sbeen mitigated at all in more recent versions. I suspect at least someof it has.

Coming back to my original problem: If a Microsoft Windows client mounts a 
Samba Share with an underlying Gluster volume and this volume goes away for 
more than 25 seconds then the Samba Share is dead and any file operation is 
cancelled. This means for instance that a big file that is being copied will be 
stored in an incomplete state in the Gluster volume. This is especially 
annoying since one server (Gluster brick) is online the whole time and all 
operations could in theory have continued without problems.

If the fop is cancelled, the application should get an error and theapplication /should/ handle that error to prevent data loss. That'sPOSIX. Having this happen once every couple of years should be withinexpectations (if it's not, expectations should be managed better, imho).

As an aside, I can't find 25 seconds in the Samba source. Are you sure?Do you have some sort of log message to track this down with?


If I reduce the ping-timeout to something like 5 seconds the problem goes away! 
File operations in the Samba Share will stall for a few seconds and then 
everything will continue.

I understand that with a regular server shutdown this should never happen 
anyway. In practice (at least with CentOS 7) this does still happen (possibly 
because the network goes away too quickly, as you suggested) but it should be 
fixable. BUT I definitely want to support hard server crashes as well. The 
current behaviour of the Samba Share is not an option!

Yeah, I'd be curious what's killing the network. It shouldn't be killedat all. NetworkManager? I know that systemd-networkd doesn't break this.

Would you therefore say it is appropriate in my use case to decrease the 
ping-timeout? Or can you think of anything else that could/should be done? I 
have no control over the client.

Since there are plenty of layers that everything goes through there are many reasons that 
additional delays could be caused. So my first instinct would be to reduce ping-timeout as much as 
possible to avoid coming near those "25 seconds". Therefore my question on some specific 
data of what the "ping-timeout" costs are.

I won't go so far as to say it's appropriate. If you know and test allthe variables around changing a setting, it's your system and if youtrust your own judgement for your use case who am I to second-guess.Since I first started trying to change it and failed, I've taken theapproach of fixing any other problems that might get in the way ofleaving it at 42 seconds. I've not been required to change theping-timeout in any of /my/ production clusters.

What confirms me in my belief that 42 seconds ping-timeout for a Samba share is 
not appropriate is the script from the Gluster repository I linked to in a 
previous mail:

I found the extras/hook-scripts/start/post/S29CTDBsetup.sh script that mounts a CTDB 
(Samba) share and explicitly sets the ping-timeout to 10 seconds. There is a comment 
saying: "Make sure ping-timeout is not default for CTDB volume". Unfortunately 
there is no explanation in the script, in the commit or in the Gerrit review history 
(https://review.gluster.org/#/c/7569/, https://review.gluster.org/#/c/8007/) for WHY you 
make sure ping-timeout is not default. Can anyone tell me the reason?

I've CC'd Harsha to see if he has any feedback on that. He's off workingon Minio now, but maybe he remembers or has an opinion.

Thanks for your help!
Kind regards,
Omar

-----Ursprüngliche Nachricht-----
Von: Joe Julian [mailto:[email protected]]
Gesendet: Freitag, 29. Dezember 2017 06:35
An: Sam McLeod <[email protected]>
Cc: Gluster Users <[email protected]>; Omar Kohl 
<[email protected]>
Betreff: Re: [Gluster-users] Exact purpose of network.ping-timeout

Restarts will go through a shutdown process. As long as the network isn't 
actively unconfigured before the final kill, the tcp connection will be 
shutdown and there will be no wait.



On 12/28/17 20:19, Sam McLeod wrote:


        Sure, if you never restart / autoscale anything and if your use case 
isn't bothered with up to 42 seconds of downtime, for us - 42 seconds is a 
really long time for something like a patient management system to refuse file 
attachments from being uploaded etc...

        We apply a strict patching policy for security and kernel updates, we 
often also load balance between underlying physical hosts and if the virtual 
hosts have lots of storage it can be quicker to let them shutdown and start on 
another host.

        So for us, gone are the old Unix days of caring about uptime, a huge 
part of our measurement of success and risk reduction has become how quickly we 
can not just deploy our software / web apps into production but also how 
quickly our platform can be reformed, patched and migrated as is effective.

        So in reality, I'd probably rolling restart our three node gluster 
clusters every few weeks or so depending on what patches have been released 
etc...
        

        --
        Sam McLeod
        https://smcleod.net
        https://twitter.com/s_mcleod


                On 29 Dec 2017, at 11:08 am, Joe Julian <[email protected] 
<mailto:[email protected]> > wrote:

                The reason for the long (42 second) ping-timeout is because 
re-establishing fd's and locks can be a very expensive operation. With an 
average MTBF of 45000 hours for a server, even just a replica 2 would result in 
a 42 second MTTR every 2.6 years, or 6 nines of uptime.
                
                
                On December 27, 2017 3:17:01 AM PST, Omar Kohl <[email protected] 
<mailto:[email protected]> > wrote:

                        Hi,
                        

                                 If you set it to 10 seconds, and a node goes 
down, you'll see a 10 seconds freez in all I/O for the volume.


                        Exactly! ONLY 10 seconds instead of the default 42 
seconds :-)
                        
                        As I said before the problem with the 42 seconds is 
that a Windows Samba Client will disconnect (and therefore interrupt any 
read/write operation) after waiting for about 25 seconds. So 42 seconds is too 
high. In this case it would therefore make more sense to reduce the 
ping-timeout, right?
                        
                        Has anyone done any performance measurements on what the 
implications of a low ping-timeout are? What are the costs of "triggering heals all 
the time"?
                        
                        On a related note I found the 
extras/hook-scripts/start/post/S29CTDBsetup.sh <http://s29ctdbsetup.sh/>  script that 
mounts a CTDB (Samba) share and explicitly sets the ping-timeout to 10 seconds. There is a 
comment saying: "Make sure ping-timeout is not default for CTDB volume". 
Unfortunately there is no explanation in the script, in the commit or in the Gerrit review 
history (https://review.gluster.org/#/c/7569/, https://review.gluster.org/#/c/8007/) for WHY 
you make sure ping-timeout is not default. Can anyone tell me the reason?
                        
                        Kind regards,
                        Omar
                        
                        -----Ursprüngliche Nachricht-----
                        Von: [email protected] 
<mailto:[email protected]>  
[mailto:[email protected]] Im Auftrag von [email protected] 
<mailto:[email protected]>
                        Gesendet: Dienstag, 26. Dezember 2017 22:05
                        An: [email protected] 
<mailto:[email protected]>
                        Betreff: Re: [Gluster-users] Exact purpose of network.ping 
<http://network.ping/> -timeout
                        
                        Hi,
                        
                        It's just the delay for which a node can stop 
responding before being marked as down.
                        Basically that's how long a node can go down before a 
heal becomes necessary to bring it back.
                        
                        If you set it to 10 seconds, and a node goes down, 
you'll see a 10 seconds freez in all I/O for the volume. That's why you don't 
want it too high (having a 2 minutes freez on I/O for example would be pretty 
bad, depending on what you host), but you don't want it too low either (to 
avoid triggering heals all the time).
                        
                        You can configure it because it depends on what you 
host. You might be okay with a few minutes freez to avoid a heal, or you might 
not care about heals at all and prefer a very low value to avoid feezes.
                        The default value should work pretty well for most 
things though
                        
                        On Tue, Dec 26, 2017 at 01:11:48PM +0000, Omar Kohl 
wrote:

                                 Hi,
                                
                                 I have a question regarding the "ping-timeout" 
option. I have been researching its purpose for a few days and it is not completely clear 
to me. Especially that it is apparently strongly encouraged by the Gluster community not 
to change or at least decrease this value!
                                
                                 Assuming that I set ping-timeout to 10 seconds 
(instead of the default 42) this would mean that if I have a network outage of 
11 seconds then Gluster internally would have to re-allocate some resources 
that it freed after the 10 seconds, correct? But apart from that there are no 
negative implications, are there? For instance if I'm copying files during the 
network outage then those files will continue copying after those 11 seconds.
                                
                                 This means that the only purpose of ping-timeout is to 
save those extra resources that are used by "short" network outages. Is that 
correct?
                                
                                 If I am confident that my network will not 
have many 11 second outages and if they do occur I am willing to incur those 
extra costs due to resource allocation is there any reason not to set 
ping-timeout to 10 seconds?
                                
                                 The problem I have with a long ping-timeout is that the 
Windows Samba Client disconnects after 25 seconds. So if one of the nodes of a Gluster 
cluster shuts down ungracefully then the Samba Client disconnects and the file that was 
being copied is incomplete on the server. These "costs" seem to be much higher 
than the potential costs of those Gluster resource re-allocations. But it is hard to 
estimate because there is not clear documentation what exactly those Gluster costs are.
                                
                                 In general I would be very interested in a 
comprehensive explanation of ping-timeout and the up- and downsides of setting 
high or low values for it.
                                
                                 Kinds regards,
                                 Omar
________________________________


                                 Gluster-users mailing list
                                 [email protected] 
<mailto:[email protected]>
                                 
http://lists.gluster.org/mailman/listinfo/gluster-users

________________________________


                        Gluster-users mailing list
                        [email protected] 
<mailto:[email protected]>
                        http://lists.gluster.org/mailman/listinfo/gluster-users


                --
                Sent from my Android device with K-9 Mail. Please excuse my 
brevity.
                _______________________________________________
                Gluster-users mailing list
                [email protected] <mailto:[email protected]>
                http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Exact purpose of network.ping-timeout

Reply via email to