It's been 15 days since the discussion had started. The only remaining concern is Srini's concern on whether we should be reusing the KEEPALIVE_TIMEOUT value for TCP_USER_TIMEOUT. From an offline discussion, we have decided to avoid increasing the complexity of correctly configuring keepalives, and reuse the KEEPALIVE_TIMEOUT channel argument for TCP_USER_TIMEOUT.
The discussion on whether a minimum value should be enforced on either TCP_USER_TIMEOUT or KEEPALIVE_TIMEOUT will be left for the future. Marking this proposal as final. On Monday, August 27, 2018 at 5:20:07 PM UTC-7, [email protected] wrote: > > Proposal has been updated. > > On Friday, August 24, 2018 at 4:04:47 PM UTC-7, Eric Anderson wrote: >> >> This would change the semantics slightly, as right now the value does >>> nothing when KEEPALIVE_TIME is infinite (the default). >>> >> >> After sleeping on this, I think that we can enable TCP_USER_TIMEOUT only >> when keepalive is on. That resolves this changing of semantics. >> >> So the proposal is: when keepalive is on, tell the kernel the >> TCP_USER_TIMEOUT is the value of KEEPALIVE_TIMEOUT. >> >> Also, >>> https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md >>> specifies >>> that KEEPALIVE_TIME is restricted to 10 seconds, but doesn't seem to impose >>> a similar restriction on KEEPALIVE_TIMEOUT >>> >> >> As I mentioned on the PR, that seems like a bit of an oversight. But I >> agree and I'll say that any discussion about enforcing a minimum value of >> KEEPALIVE_TIMEOUT can be a separate discussion and doesn't need to happen >> now. >> >> On Thu, Aug 23, 2018 at 7:26 PM 'Srini Polavarapu' via grpc.io < >> [email protected]> wrote: >> >>> In my opinion, gRPC should not set an artificial limit on min value of >>> TCP_USER_TIMEOUT. It is a well know option available in Linux for a long >>> time. It should be a pass-thru value for gRPC as it does not modify the >>> kernel behavior w.r.t this setting. There are applications (e.g. in >>> graphics design) where huge amounts of data needs to be transferred on >>> lossless fabric and sub-second network error detection is crucial. There >>> are setups where retransmissions are extremely rare and treated as errors. >>> Setting an arbitrary min value of 10 secs doesn't seem right. >>> >>> On Thursday, August 23, 2018 at 10:53:16 AM UTC-7, [email protected] >>> wrote: >>>> >>>> Also, >>>> https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md >>>> specifies >>>> that KEEPALIVE_TIME is restricted to 10 seconds, but doesn't seem to >>>> impose >>>> a similar restriction on KEEPALIVE_TIMEOUT >>>> >>>> On Thursday, August 23, 2018 at 10:21:08 AM UTC-7, [email protected] >>>> wrote: >>>>> >>>>> I like the idea of reusing the channel option KEEPALIVE_TIMEOUT for >>>>> this, but I am hesitant for exactly the reason that you pointed out. It >>>>> would give meaning to KEEPALIVE_TIMEOUT even if keepalive is disabled by >>>>> setting KEEPALIVE_TIME to infinite. Also, given the fact that >>>>> TCP_USER_TIMEOUT is not supported for on all platforms, it would mean >>>>> that >>>>> KEEPALIVE_TIMEOUT would behave differently on different systems. On the >>>>> other hand, if we isolate this as a separate parameter for only those >>>>> platforms that support it, it allows us to explicitly say that it is only >>>>> valid for linux kernel versions 2.6.37 and later. >>>>> >>>>> TCP_USER_TIMEOUT should not have any affect on retransmits, other than >>>>> shutting down the connection (which ofcourse might prevent a retransmit >>>>> from taking place). I am currently of the opinion that if an application >>>>> decides to change the timeout value from the default of 20 seconds, it is >>>>> doing so knowingly and owns the responsibility of connections being >>>>> dropped >>>>> because of that. >>>>> >>>>> On Thursday, August 23, 2018 at 8:45:15 AM UTC-7, Eric Anderson wrote: >>>>>> >>>>>> Also, this stuff is pretty complex for users already. Adding *yet >>>>>> another* configuration parameter just worsens that. I'd much rather >>>>>> they just set one set of parameters and we make the most use of them as >>>>>> we >>>>>> can on each platform. >>>>>> >>>>>> On Thu, Aug 23, 2018 at 8:43 AM Eric Anderson <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I'd prefer we re-used KEEPALIVE_TIMEOUT for this. This would change >>>>>>> the semantics slightly, as right now the value does nothing when >>>>>>> KEEPALIVE_TIME is infinite (the default). However, it makes a lot of >>>>>>> sense >>>>>>> to use the same value for both entries because they have mostly-shared >>>>>>> fate. The only difference is that keepalive goes through the remote >>>>>>> application whereas TCP_USER_TIMEOUT can be triggered directly by the >>>>>>> kernel. The kernel will delay ACKs to combine them or to attach them to >>>>>>> outgoing data. So when sending a keepalive, I'd expect the application >>>>>>> to >>>>>>> influence how soon data is ACK'ed, so they would be transmitted on the >>>>>>> same >>>>>>> packet frequently. >>>>>>> >>>>>>> Also, KEEPALIVE_TIMEOUT is limited to no lower than 10 seconds. That >>>>>>> is a very appropriate limit for TCP_USER_TIMEOUT as well, as >>>>>>> application >>>>>>> authors will commonly think "oh, a second looks good!" or "Oh, 100ms is >>>>>>> plenty!". But that ignores retransmits and puts applications in a very >>>>>>> dangerous position that can cause network collapse when the network >>>>>>> slows >>>>>>> down, even with datacenter networks. >>>>>>> >>>>>>> On Wed, Aug 22, 2018 at 1:23 PM yashkt via grpc.io < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> This is the discussion thread for the proposal at >>>>>>>> https://github.com/grpc/proposal/pull/95 >>>>>>>> >>>>>>>> The proposal is to provide an option to set the socket >>>>>>>> TCP_USER_TIMEOUT for platforms running on Linux kernels 2.6.37 and >>>>>>>> later. >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "grpc.io" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/grpc-io/4d585ee1-2dba-4895-9d55-b637a587b93d%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/grpc-io/4d585ee1-2dba-4895-9d55-b637a587b93d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "grpc.io" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/grpc-io. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/grpc-io/41184670-5415-4d3b-bfb0-24b58deccfd3%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/grpc-io/41184670-5415-4d3b-bfb0-24b58deccfd3%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/b3278074-0972-4416-bc80-74f95e2d0575%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
