Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Robert Engels Wed, 21 Nov 2018 08:17:42 -0800

I thought your original message said you were simulating the server going down 
using iptables and causing packet loss?


> On Nov 21, 2018, at 9:52 AM, [email protected] wrote:
> 
> I'm not sure I follow you on that one. I am taking the server up and down 
> myself. Everything works fine if I just make rpc calls on the client and 
> check the error codes. The problem was the 20 seconds blocking on secondary 
> rpc calls for the reconnect, which seems to be due to the backoff algorithm. 
> I was hoping to shrink that wait if possible to something smaller. Setting 
> the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to still take the full 
> 20 seconds when making an RPC call.
> 
> Using GetState on the channel looked like it was going to get rid of the 
> blocking nature on a broken connection but the state of the channel doesn't 
> seem to change from transient failure once the server comes back up. Tried 
> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
> but those didn't seem to trigger a state change on the channel.
> 
> Seems like the only way to trigger a state change on the channel is to make 
> an actual rpc call.
> 
> I think the answer might just be update to a newer version of rpc and look at 
> using the MIN_RECONNECT_BACKOFF channel arg setting and probably downloading 
> the source and looking at how those variables are used :). 
> 
> 
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>> The other thing to keep in mind is that the way you are “forcing failure” is 
>> error prone - the connection is valid as packets are making it through. It 
>> is just that is will be very slow due to extreme packet loss. I am not sure 
>> this is considered a failure by gRPC. I think you would need to detect slow 
>> network connections and abort that server yourself. 
>> 
>>> On Nov 21, 2018, at 9:12 AM, [email protected] wrote:
>>> 
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>> 
>>> I could skip all updates for 20 seconds after an update but that seems less 
>>> than ideal.
>>> 
>>> By server available I was using the GetState on the channel. The problem I 
>>> was running into was that if I only call GetState on the channel to see if 
>>> the server is around it "forever" stays in the state of transient failure 
>>> (at least for 60 seconds). I was expecting to see a state change back to 
>>> idle/ready after a bit.
>>> 
>>>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>>> You should track the err after each update, and if non-nil, just return… 
>>>> why keep trying the further updates in that loop.
>>>> 
>>>> It is also trivial too - to not even attempt the next loop if it has been 
>>>> less than N ms since the last error.
>>>> 
>>>> According to your pseudo code, you already have the ‘server available’ 
>>>> status.
>>>> 
>>>>> On Nov 20, 2018, at 9:22 PM, [email protected] wrote:
>>>>> 
>>>>> GRPC Version: 1.3.9
>>>>> Platform: Windows
>>>>> 
>>>>> I'm working on a prototype application that periodically calculates data 
>>>>> and then in a multi-step process pushes the data to a server. The design 
>>>>> is that the server doesn't need to be up or can go down mid process. The 
>>>>> client will not block (or block as little as possible) between updates if 
>>>>> there is problem pushing data.
>>>>> 
>>>>> A simple model for the client would be:
>>>>> Loop Until Done
>>>>> {
>>>>>  Calculate Data
>>>>>  If Server Available and No Error Begin Update
>>>>>  If Server Available and No Error UpdateX (Optional)
>>>>>  If Server Available and No Error UpdateY (Optional)
>>>>>  If Server Available and No Error UpdateZ (Optional)
>>>>>  If Server Available and No Error End Update
>>>>> }
>>>>> 
>>>>> The client doesn't care if the server is available but if it is should 
>>>>> push data, if any errors skip everything else until next update.
>>>>> 
>>>>> The problem is that if I make an call on the client (and the server isn't 
>>>>> available) the first fails very quickly (~1sec) and the rest take a 
>>>>> "long" time, ~20sec. It looks like this is due to the reconnect backoff 
>>>>> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the 
>>>>> channel args to a lower value (2000) but that didn't have any positive 
>>>>> affect.
>>>>> 
>>>>> I tried using GetState(true) on the channel to determine if we need to 
>>>>> skip an update. This call fails very quickly but never seems to get out 
>>>>> of the transient failure state after the server was started (waited for 
>>>>> over 60 seconds). On the documentation it looked like the param for 
>>>>> GetState only affects if the channel was in the idle state to attempt a 
>>>>> reconnect.
>>>>> 
>>>>> What is the best way to achieve the functionality we'd like?
>>>>> 
>>>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
>>>>> in a later version of grpc, would that cause the grpc call to "fail fast" 
>>>>> if I upgraded and set that to a low value ~1sec?
>>>>> 
>>>>> Is there a better way to handle this situation in general?
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "grpc.io" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/1760572C-0C1F-4D2C-A3AB-54624DD776A5%40earthlink.net.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Reply via email to