Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

justin . cheuvront Wed, 21 Nov 2018 09:10:12 -0800

I found a fix for my problem.

Looked at the latest source and there was a test argument being used 
"grpc.testing.fixed_reconnect_backoff_ms" that sets both the min and max 
backoff times to that value. Found the source for 1.3.9 on a machine and it 
was being used then as well. Setting that argument to 2000ms does what I 
wanted. 1000ms seemed to be to low of a value, the rpc calls continued to 
fail. 2000 seems to reconnect just fine. Once we update to a newer version 
of grpc I can change that to set the min and max backoff times.


On Wednesday, November 21, 2018 at 11:24:58 AM UTC-5, Robert Engels wrote:
>
> Sorry. Still if you forcibly remove the cable or hard shut the server, the 
> client can’t tell if the server is down with out some sort of ping pong 
> protocol with timeout. The tcp ip timeout is on the order of hours, and 
> minutes if keep alive is set. 
>
> On Nov 21, 2018, at 10:20 AM, [email protected] <javascript:> wrote:
>
> That must have been a different person :). I'm actually taking down the 
> server and restarting it, no simulation of it.
>
> On Wednesday, November 21, 2018 at 11:17:24 AM UTC-5, Robert Engels wrote:
>>
>> I thought your original message said you were simulating the server going 
>> down using iptables and causing packet loss?
>>
>> On Nov 21, 2018, at 9:52 AM, [email protected] wrote:
>>
>> I'm not sure I follow you on that one. I am taking the server up and down 
>> myself. Everything works fine if I just make rpc calls on the client and 
>> check the error codes. The problem was the 20 seconds blocking on secondary 
>> rpc calls for the reconnect, which seems to be due to the backoff 
>> algorithm. I was hoping to shrink that wait if possible to something 
>> smaller. Setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS to 5000 seemed to 
>> still take the full 20 seconds when making an RPC call.
>>
>> Using GetState on the channel looked like it was going to get rid of the 
>> blocking nature on a broken connection but the state of the channel doesn't 
>> seem to change from transient failure once the server comes back up. Tried 
>> using KEEPALIVE_TIME, KEEPALIVE_TIMEOUT and KEEPALIVE_PERMIT_WITHOUT_CALLS 
>> but those didn't seem to trigger a state change on the channel.
>>
>> Seems like the only way to trigger a state change on the channel is to 
>> make an actual rpc call.
>>
>> I think the answer might just be update to a newer version of rpc and 
>> look at using the MIN_RECONNECT_BACKOFF channel arg setting and probably 
>> downloading the source and looking at how those variables are used :). 
>>
>>
>> On Wednesday, November 21, 2018 at 10:16:45 AM UTC-5, Robert Engels wrote:
>>>
>>> The other thing to keep in mind is that the way you are “forcing 
>>> failure” is error prone - the connection is valid as packets are making it 
>>> through. It is just that is will be very slow due to extreme packet loss. I 
>>> am not sure this is considered a failure by gRPC. I think you would need to 
>>> detect slow network connections and abort that server yourself. 
>>>
>>> On Nov 21, 2018, at 9:12 AM, [email protected] wrote:
>>>
>>> I do check the error code after each update and skip the rest of the 
>>> current iterations updates if a failure occurred.
>>>
>>> I could skip all updates for 20 seconds after an update but that seems 
>>> less than ideal.
>>>
>>> By server available I was using the GetState on the channel. The problem 
>>> I was running into was that if I only call GetState on the channel to see 
>>> if the server is around it "forever" stays in the state of transient 
>>> failure (at least for 60 seconds). I was expecting to see a state change 
>>> back to idle/ready after a bit.
>>>
>>> On Tuesday, November 20, 2018 at 11:19:09 PM UTC-5, robert engels wrote:
>>>>
>>>> You should track the err after each update, and if non-nil, just 
>>>> return… why keep trying the further updates in that loop.
>>>>
>>>> It is also trivial too - to not even attempt the next loop if it has 
>>>> been less than N ms since the last error.
>>>>
>>>> According to your pseudo code, you already have the ‘server available’ 
>>>> status.
>>>>
>>>> On Nov 20, 2018, at 9:22 PM, [email protected] wrote:
>>>>
>>>> GRPC Version: 1.3.9
>>>> Platform: Windows
>>>>
>>>> I'm working on a prototype application that periodically calculates 
>>>> data and then in a multi-step process pushes the data to a server. The 
>>>> design is that the server doesn't need to be up or can go down mid 
>>>> process. 
>>>> The client will not block (or block as little as possible) between updates 
>>>> if there is problem pushing data.
>>>>
>>>> A simple model for the client would be:
>>>> Loop Until Done
>>>> {
>>>>  Calculate Data
>>>>  If Server Available and No Error Begin Update
>>>>  If Server Available and No Error UpdateX (Optional)
>>>>  If Server Available and No Error UpdateY (Optional)
>>>>  If Server Available and No Error UpdateZ (Optional)
>>>>  If Server Available and No Error End Update
>>>> }
>>>>
>>>> The client doesn't care if the server is available but if it is should 
>>>> push data, if any errors skip everything else until next update.
>>>>
>>>> The problem is that if I make an call on the client (and the server 
>>>> isn't available) the first fails very quickly (~1sec) and the rest take a 
>>>> "long" time, ~20sec. It looks like this is due to the reconnect backoff 
>>>> time. I tried setting the GRPC_ARG_MAX_RECONNECT_BACKOFF_MS on the channel 
>>>> args to a lower value (2000) but that didn't have any positive affect.
>>>>
>>>> I tried using GetState(true) on the channel to determine if we need to 
>>>> skip an update. This call fails very quickly but never seems to get out of 
>>>> the transient failure state after the server was started (waited for over 
>>>> 60 seconds). On the documentation it looked like the param for GetState 
>>>> only affects if the channel was in the idle state to attempt a reconnect.
>>>>
>>>> What is the best way to achieve the functionality we'd like?
>>>>
>>>> I noticed there was a new GRPC_ARG_MIN_RECONNECT_BACKOFF_MS option 
>>>> added in a later version of grpc, would that cause the grpc call to "fail 
>>>> fast" if I upgraded and set that to a low value ~1sec?
>>>>
>>>> Is there a better way to handle this situation in general?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "grpc.io" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/grpc-io.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/grpc-io/9fb7bf54-88fa-4781-8864-c9b2b06d5f0e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "grpc.io" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/grpc-io.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/grpc-io/c8c655a5-75d0-44f0-8103-d47217adf251%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/grpc-io.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/grpc-io/0cd559d0-ea9b-45f5-8a02-b1b5972942ba%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> -- 
> You received this message because you are subscribed to the Google Groups "
> grpc.io" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] <javascript:>
> .
> Visit this group at https://groups.google.com/group/grpc-io.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/grpc-io/a5359020-820e-460f-8ee6-ccbb1771df06%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/grpc-io/a5359020-820e-460f-8ee6-ccbb1771df06%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/7e2ce2c0-778c-4022-b7a8-ddf5c1235022%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [grpc-io] GRPC C++ Question on best practices for Client handling of servers going up and down

Reply via email to