I found a fix for my problem. Looked at the latest source and there was a test argument being used "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max backoff times to that value. Found the source for 1.3.9 on a machine and it was being used then as well. Setting that argument to 2000ms does what I wanted. 1000ms seemed to be to low of a value, the rpc calls continued to fail. 2000 seems to reconnect just fine. Once we update to a newer version of grpc I can change that to set the min and max backoff times.
On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote: > > Sorry. Still if you forcibly remove the cable or hard shut the server, the > client can’t tell if the server is down with out some sort of ping pong > protocol with timeout. The tcp ip timeout is on the order of hours, and > minutes if keep alive is set. > > On Nov 21, 2018, at 10:20 AM, [email protected] <javascript:> wrote: > > That must have been a different person :). I'm actually taking down the > server and restarting it, no simulation of it. > > On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote: >> >> I thought your original message said you were simulating the server going >> down using iptables and causing packet loss? >> >> On Nov 21, 2018, at 9:52 AM, [email protected] wrote: >> >> I'm not sure I follow you on that one. I am taking the server up and down >> myself. Everything works fine if I just make rpc calls on the client and >> check the error codes. The problem was the 20 seconds blocking on secondary >> rpc calls for the reconnect, which seems to be due to the backoff >> algorithm. I was hoping to shrink that wait if possible to something >> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to >> still take the full 20 seconds when making an RPC call. >> >> Using GetState on the channel looked like it was going to get rid of the >> blocking nature on a broken connection but the state of the channel doesn't >> seem to change from transient failure once the server comes back up. Tried >> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS >> but those didn't seem to trigger a state change on the channel. >> >> Seems like the only way to trigger a state change on the channel is to >> make an actual rpc call. >> >> I think the answer might just be update to a newer version of rpc and >> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably >> downloading the source and looking at how those variables are used :). >> >> >> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote: >>> >>> The other thing to keep in mind is that the way you are “forcing >>> failure” is error prone - the connection is valid as packets are making it >>> through. It is just that is will be very slow due to extreme packet loss. I >>> am not sure this is considered a failure by gRPC. I think you would need to >>> detect slow network connections and abort that server yourself. >>> >>> On Nov 21, 2018, at 9:12 AM, [email protected] wrote: >>> >>> I do check the error code after each update and skip the rest of the >>> current iterations updates if a failure occurred. >>> >>> I could skip all updates for 20 seconds after an update but that seems >>> less than ideal. >>> >>> By server available I was using the GetState on the channel. The problem >>> I was running into was that if I only call GetState on the channel to see >>> if the server is around it "forever" stays in the state of transient >>> failure (at least for 60 seconds). I was expecting to see a state change >>> back to idle/ready after a bit. >>> >>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote: >>>> >>>> You should track the err after each update, and if non-nil, just >>>> return… why keep trying the further updates in that loop. >>>> >>>> It is also trivial too - to not even attempt the next loop if it has >>>> been less than N ms since the last error. >>>> >>>> According to your pseudo code, you already have the ‘server available’ >>>> status. >>>> >>>> On Nov 20, 2018, at 9:22 PM, [email protected] wrote: >>>> >>>> GRPC Version: 1.3.9 >>>> Platform: Windows >>>> >>>> I'm working on a prototype application that periodically calculates >>>> data and then in a multi-step process pushes the data to a server. The >>>> design is that the server doesn't need to be up or can go down mid >>>> process. >>>> The client will not block (or block as little as possible) between updates >>>> if there is problem pushing data. >>>> >>>> A simple model for the client would be: >>>> Loop Until Done >>>> { >>>> Calculate Data >>>> If Server Available and No Error Begin Update >>>> If Server Available and No Error UpdateX (Optional) >>>> If Server Available and No Error UpdateY (Optional) >>>> If Server Available and No Error UpdateZ (Optional) >>>> If Server Available and No Error End Update >>>> } >>>> >>>> The client doesn't care if the server is available but if it is should >>>> push data, if any errors skip everything else until next update. >>>> >>>> The problem is that if I make an call on the client (and the server >>>> isn't available) the first fails very quickly (~1sec) and the rest take a >>>> "long" time, ~20sec. It looks like this is due to the reconnect backoff >>>> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel >>>> args to a lower value (2000) but that didn't have any positive affect. >>>> >>>> I tried using GetState(true) on the channel to determine if we need to >>>> skip an update. This call fails very quickly but never seems to get out of >>>> the transient failure state after the server was started (waited for over >>>> 60 seconds). On the documentation it looked like the param for GetState >>>> only affects if the channel was in the idle state to attempt a reconnect. >>>> >>>> What is the best way to achieve the functionality we'd like? >>>> >>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option >>>> added in a later version of grpc, would that cause the grpc call to "fail >>>> fast" if I upgraded and set that to a low value ~1sec? >>>> >>>> Is there a better way to handle this situation in general? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "grpc.io" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/grpc-io. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "grpc.io" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/grpc-io. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "grpc.io" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/grpc-io. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com >> >> <https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> -- > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] <javascript:> > . > Visit this group at https://groups.google.com/group/grpc-io. > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/a5359020-820e-460f-8ee6-ccbb1771df06%40googlegroups.com > > <https://groups.google.com/d/msgid/grpc-io/a5359020-820e-460f-8ee6-ccbb1771df06%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/7e2ce2c0-778c-4022-b7a8-ddf5c1235022%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
