I’m glad you got it working. Something doesn’t seem right though that going from 1 sec to 2 sec causes things to work... I would think that production anomalies could easily cause similar degradation, so I think this solution may be fragile... still, I’m still not sure I 100% get the problem you’re having but if you understand exactly why it works that’s good enough for me :)
> On Nov 21, 2018, at 11:09 AM, justin.cheuvr...@ansys.com wrote: > > I found a fix for my problem. > > Looked at the latest source and there was a test argument being used > "grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max > backoff times to that value. Found the source for 1.3.9 on a machine and it > was being used then as well. Setting that argument to 2000ms does what I > wanted. 1000ms seemed to be to low of a value, the rpc calls continued to > fail. 2000 seems to reconnect just fine. Once we update to a newer version of > grpc I can change that to set the min and max backoff times. > >> On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote: >> Sorry. Still if you forcibly remove the cable or hard shut the server, the >> client can’t tell if the server is down with out some sort of ping pong >> protocol with timeout. The tcp ip timeout is on the order of hours, and >> minutes if keep alive is set. >> >>> On Nov 21, 2018, at 10:20 AM, justin.c...@ansys.com wrote: >>> >>> That must have been a different person :). I'm actually taking down the >>> server and restarting it, no simulation of it. >>> >>>> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote: >>>> I thought your original message said you were simulating the server going >>>> down using iptables and causing packet loss? >>>> >>>>> On Nov 21, 2018, at 9:52 AM, justin.c...@ansys.com wrote: >>>>> >>>>> I'm not sure I follow you on that one. I am taking the server up and down >>>>> myself. Everything works fine if I just make rpc calls on the client and >>>>> check the error codes. The problem was the 20 seconds blocking on >>>>> secondary rpc calls for the reconnect, which seems to be due to the >>>>> backoff algorithm. I was hoping to shrink that wait if possible to >>>>> something smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 >>>>> seemed to still take the full 20 seconds when making an RPC call. >>>>> >>>>> Using GetState on the channel looked like it was going to get rid of the >>>>> blocking nature on a broken connection but the state of the channel >>>>> doesn't seem to change from transient failure once the server comes back >>>>> up. Tried using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and >>>>> KEEPALIVE_PERMIT_WITHOUT_CALLS but those didn't seem to trigger a state >>>>> change on the channel. >>>>> >>>>> Seems like the only way to trigger a state change on the channel is to >>>>> make an actual rpc call. >>>>> >>>>> I think the answer might just be update to a newer version of rpc and >>>>> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably >>>>> downloading the source and looking at how those variables are used :). >>>>> >>>>> >>>>>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels >>>>>> wrote: >>>>>> The other thing to keep in mind is that the way you are “forcing >>>>>> failure” is error prone - the connection is valid as packets are making >>>>>> it through. It is just that is will be very slow due to extreme packet >>>>>> loss. I am not sure this is considered a failure by gRPC. I think you >>>>>> would need to detect slow network connections and abort that server >>>>>> yourself. >>>>>> >>>>>>> On Nov 21, 2018, at 9:12 AM, justin.c...@ansys.com wrote: >>>>>>> >>>>>>> I do check the error code after each update and skip the rest of the >>>>>>> current iterations updates if a failure occurred. >>>>>>> >>>>>>> I could skip all updates for 20 seconds after an update but that seems >>>>>>> less than ideal. >>>>>>> >>>>>>> By server available I was using the GetState on the channel. The >>>>>>> problem I was running into was that if I only call GetState on the >>>>>>> channel to see if the server is around it "forever" stays in the state >>>>>>> of transient failure (at least for 60 seconds). I was expecting to see >>>>>>> a state change back to idle/ready after a bit. >>>>>>> >>>>>>>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels >>>>>>>> wrote: >>>>>>>> You should track the err after each update, and if non-nil, just >>>>>>>> return… why keep trying the further updates in that loop. >>>>>>>> >>>>>>>> It is also trivial too - to not even attempt the next loop if it has >>>>>>>> been less than N ms since the last error. >>>>>>>> >>>>>>>> According to your pseudo code, you already have the ‘server available’ >>>>>>>> status. >>>>>>>> >>>>>>>>> On Nov 20, 2018, at 9:22 PM, justin.c...@ansys.com wrote: >>>>>>>>> >>>>>>>>> GRPC Version: 1.3.9 >>>>>>>>> Platform: Windows >>>>>>>>> >>>>>>>>> I'm working on a prototype application that periodically calculates >>>>>>>>> data and then in a multi-step process pushes the data to a server. >>>>>>>>> The design is that the server doesn't need to be up or can go down >>>>>>>>> mid process. The client will not block (or block as little as >>>>>>>>> possible) between updates if there is problem pushing data. >>>>>>>>> >>>>>>>>> A simple model for the client would be: >>>>>>>>> Loop Until Done >>>>>>>>> { >>>>>>>>> Calculate Data >>>>>>>>> If Server Available and No Error Begin Update >>>>>>>>> If Server Available and No Error UpdateX (Optional) >>>>>>>>> If Server Available and No Error UpdateY (Optional) >>>>>>>>> If Server Available and No Error UpdateZ (Optional) >>>>>>>>> If Server Available and No Error End Update >>>>>>>>> } >>>>>>>>> >>>>>>>>> The client doesn't care if the server is available but if it is >>>>>>>>> should push data, if any errors skip everything else until next >>>>>>>>> update. >>>>>>>>> >>>>>>>>> The problem is that if I make an call on the client (and the server >>>>>>>>> isn't available) the first fails very quickly (~1sec) and the rest >>>>>>>>> take a "long" time, ~20sec. It looks like this is due to the >>>>>>>>> reconnect backoff time. I tried setting the >>>>>>>>> GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to a lower >>>>>>>>> value (2000) but that didn't have any positive affect. >>>>>>>>> >>>>>>>>> I tried using GetState(true) on the channel to determine if we need >>>>>>>>> to skip an update. This call fails very quickly but never seems to >>>>>>>>> get out of the transient failure state after the server was started >>>>>>>>> (waited for over 60 seconds). On the documentation it looked like the >>>>>>>>> param for GetState only affects if the channel was in the idle state >>>>>>>>> to attempt a reconnect. >>>>>>>>> >>>>>>>>> What is the best way to achieve the functionality we'd like? >>>>>>>>> >>>>>>>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option >>>>>>>>> added in a later version of grpc, would that cause the grpc call to >>>>>>>>> "fail fast" if I upgraded and set that to a low value ~1sec? >>>>>>>>> >>>>>>>>> Is there a better way to handle this situation in general? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "grpc.io" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to grpc-io+u...@googlegroups.com. >>>>>>>>> To post to this group, send email to grp...@googlegroups.com. >>>>>>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "grpc.io" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>> an email to grpc-io+u...@googlegroups.com. >>>>>>> To post to this group, send email to grp...@googlegroups.com. >>>>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "grpc.io" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to grpc-io+u...@googlegroups.com. >>>>> To post to this group, send email to grp...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "grpc.io" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to grpc-io+u...@googlegroups.com. >>> To post to this group, send email to grp...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/grpc-io. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/grpc-io/a5359020-820e-460f-8ee6-ccbb1771df06%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to grpc-io+unsubscr...@googlegroups.com. > To post to this group, send email to grpc-io@googlegroups.com. > Visit this group at https://groups.google.com/group/grpc-io. > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/7e2ce2c0-778c-4022-b7a8-ddf5c1235022%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To post to this group, send email to grpc-io@googlegroups.com. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/54DC4E9F-0EDA-44EA-BDCF-E46707A57F73%40earthlink.net. For more options, visit https://groups.google.com/d/optout.