I thought your original message said you were simulating the server going down using iptables and causing packet loss?
> On Nov 21, 2018, at 9:52 AM, [email protected] wrote: > > I'm not sure I follow you on that one. I am taking the server up and down > myself. Everything works fine if I just make rpc calls on the client and > check the error codes. The problem was the 20 seconds blocking on secondary > rpc calls for the reconnect, which seems to be due to the backoff algorithm. > I was hoping to shrink that wait if possible to something smaller. Setting > the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to still take the full > 20 seconds when making an RPC call. > > Using GetState on the channel looked like it was going to get rid of the > blocking nature on a broken connection but the state of the channel doesn't > seem to change from transient failure once the server comes back up. Tried > using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS > but those didn't seem to trigger a state change on the channel. > > Seems like the only way to trigger a state change on the channel is to make > an actual rpc call. > > I think the answer might just be update to a newer version of rpc and look at > using the MIN_RECONNECT_BACKOFF channel arg setting and probably downloading > the source and looking at how those variables are used :). > > >> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote: >> The other thing to keep in mind is that the way you are “forcing failure” is >> error prone - the connection is valid as packets are making it through. It >> is just that is will be very slow due to extreme packet loss. I am not sure >> this is considered a failure by gRPC. I think you would need to detect slow >> network connections and abort that server yourself. >> >>> On Nov 21, 2018, at 9:12 AM, [email protected] wrote: >>> >>> I do check the error code after each update and skip the rest of the >>> current iterations updates if a failure occurred. >>> >>> I could skip all updates for 20 seconds after an update but that seems less >>> than ideal. >>> >>> By server available I was using the GetState on the channel. The problem I >>> was running into was that if I only call GetState on the channel to see if >>> the server is around it "forever" stays in the state of transient failure >>> (at least for 60 seconds). I was expecting to see a state change back to >>> idle/ready after a bit. >>> >>>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote: >>>> You should track the err after each update, and if non-nil, just return… >>>> why keep trying the further updates in that loop. >>>> >>>> It is also trivial too - to not even attempt the next loop if it has been >>>> less than N ms since the last error. >>>> >>>> According to your pseudo code, you already have the ‘server available’ >>>> status. >>>> >>>>> On Nov 20, 2018, at 9:22 PM, [email protected] wrote: >>>>> >>>>> GRPC Version: 1.3.9 >>>>> Platform: Windows >>>>> >>>>> I'm working on a prototype application that periodically calculates data >>>>> and then in a multi-step process pushes the data to a server. The design >>>>> is that the server doesn't need to be up or can go down mid process. The >>>>> client will not block (or block as little as possible) between updates if >>>>> there is problem pushing data. >>>>> >>>>> A simple model for the client would be: >>>>> Loop Until Done >>>>> { >>>>> Calculate Data >>>>> If Server Available and No Error Begin Update >>>>> If Server Available and No Error UpdateX (Optional) >>>>> If Server Available and No Error UpdateY (Optional) >>>>> If Server Available and No Error UpdateZ (Optional) >>>>> If Server Available and No Error End Update >>>>> } >>>>> >>>>> The client doesn't care if the server is available but if it is should >>>>> push data, if any errors skip everything else until next update. >>>>> >>>>> The problem is that if I make an call on the client (and the server isn't >>>>> available) the first fails very quickly (~1sec) and the rest take a >>>>> "long" time, ~20sec. It looks like this is due to the reconnect backoff >>>>> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the >>>>> channel args to a lower value (2000) but that didn't have any positive >>>>> affect. >>>>> >>>>> I tried using GetState(true) on the channel to determine if we need to >>>>> skip an update. This call fails very quickly but never seems to get out >>>>> of the transient failure state after the server was started (waited for >>>>> over 60 seconds). On the documentation it looked like the param for >>>>> GetState only affects if the channel was in the idle state to attempt a >>>>> reconnect. >>>>> >>>>> What is the best way to achieve the functionality we'd like? >>>>> >>>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added >>>>> in a later version of grpc, would that cause the grpc call to "fail fast" >>>>> if I upgraded and set that to a low value ~1sec? >>>>> >>>>> Is there a better way to handle this situation in general? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "grpc.io" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/grpc-io. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "grpc.io" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/grpc-io. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/grpc-io. > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/1760572C-0C1F-4D2C-A3AB-54624DD776A5%40earthlink.net. For more options, visit https://groups.google.com/d/optout.
