Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

justin . cheuvront Wed, 21 Nov 2018 07:53:04 -0800

I'm not sure I follow you on that one. I am taking the server up and down 
myself. Everything works fine if I just make rpc calls on the client and 
check the error codes. The problem was the 20 seconds blocking on secondary 
rpc calls for the reconnect, which seems to be due to the backoff 
algorithm. I was hoping to shrink that wait if possible to something 
smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
still take the full 20 seconds when making an RPC call.


Using GetState on the channel looked like it was going to get rid of the 
blocking nature on a broken connection but the state of the channel doesn't 
seem to change from transient failure once the server comes back up. Tried 
using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
but those didn't seem to trigger a state change on the channel.

Seems like the only way to trigger a state change on the channel is to make 
an actual rpc call.

I think the answer might just be update to a newer version of rpc and look 
at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
downloading the source and looking at how those variables are used :). 


On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>
> The other thing to keep in mind is that the way you are “forcing failure” 
> is error prone - the connection is valid as packets are making it through. 
> It is just that is will be very slow due to extreme packet loss. I am not 
> sure this is considered a failure by gRPC. I think you would need to detect 
> slow network connections and abort that server yourself. 
>
> On Nov 21, 2018, at 9:12 AM, [email protected] <javascript:> wrote:
>
> I do check the error code after each update and skip the rest of the 
> current iterations updates if a failure occurred.
>
> I could skip all updates for 20 seconds after an update but that seems 
> less than ideal.
>
> By server available I was using the GetState on the channel. The problem I 
> was running into was that if I only call GetState on the channel to see if 
> the server is around it "forever" stays in the state of transient failure 
> (at least for 60 seconds). I was expecting to see a state change back to 
> idle/ready after a bit.
>
> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>
>> You should track the err after each update, and if non-nil, just return… 
>> why keep trying the further updates in that loop.
>>
>> It is also trivial too - to not even attempt the next loop if it has been 
>> less than N ms since the last error.
>>
>> According to your pseudo code, you already have the ‘server available’ 
>> status.
>>
>> On Nov 20, 2018, at 9:22 PM, [email protected] wrote:
>>
>> GRPC Version: 1.3.9
>> Platform: Windows
>>
>> I'm working on a prototype application that periodically calculates data 
>> and then in a multi-step process pushes the data to a server. The design is 
>> that the server doesn't need to be up or can go down mid process. The 
>> client will not block (or block as little as possible) between updates if 
>> there is problem pushing data.
>>
>> A simple model for the client would be:
>> Loop Until Done
>> {
>>  Calculate Data
>>  If Server Available and No Error Begin Update
>>  If Server Available and No Error UpdateX (Optional)
>>  If Server Available and No Error UpdateY (Optional)
>>  If Server Available and No Error UpdateZ (Optional)
>>  If Server Available and No Error End Update
>> }
>>
>> The client doesn't care if the server is available but if it is should 
>> push data, if any errors skip everything else until next update.
>>
>> The problem is that if I make an call on the client (and the server isn't 
>> available) the first fails very quickly (~1sec) and the rest take a "long" 
>> time, ~20sec. It looks like this is due to the reconnect backoff time. I 
>> tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel args to 
>> a lower value (2000) but that didn't have any positive affect.
>>
>> I tried using GetState(true) on the channel to determine if we need to 
>> skip an update. This call fails very quickly but never seems to get out of 
>> the transient failure state after the server was started (waited for over 
>> 60 seconds). On the documentation it looked like the param for GetState 
>> only affects if the channel was in the idle state to attempt a reconnect.
>>
>> What is the best way to achieve the functionality we'd like?
>>
>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option added 
>> in a later version of grpc, would that cause the grpc call to "fail fast" 
>> if I upgraded and set that to a low value ~1sec?
>>
>> Is there a better way to handle this situation in general?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> -- 
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] <javascript:>
> .
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Reply via email to