Responses inline.

On Tuesday, August 15, 2017 at 11:47:07 PM UTC-7, Uli Bubenheimer wrote:
>
> Carl, thanks for your input.
>
> In client-server development, especially for mobile, I can control certain 
> things better than others. I have control over the server and can make it 
> very reliable, so that I can mostly rule it out as a point of failure. By 
> contrast, the network is chaotic. Network communication is going over the 
> cellular network in my case, and I have to design for substantial 
> connectivity problems because the app is generally used in remote 
> locations. So usually the blame for communications problems in my case lies 
> on the client-side network.
>
> How can you tell if a connection is broken?  Unless you receive a packet 
>> saying the host isn't reachable, its possible the remote endpoint is just 
>> taking a long time.  It can't be distinguished from radio silence.
>>
>
> That's the crux of the problem - I usually can't tell if the connection is 
> broken in a timely manner when a Deadline expires.
>

To be clear, I am not sure *anyone* can tell if a connection is gone.  The 
best bet is getting connection level timeouts.  On NettyChannelBuilder we 
expose setting channel options there, but I don't know if we have one for 
OkHttpChannelBuilder, which is likely what you are using.  We can probably 
expose one if you file a Github Issue.   (ICMP packets are a close second, 
but I am not holding my breath for those.)

 

> So I have to use good heuristics (make informed guesses) in a way that 
> minimizes the impact on app users. If I take an optimistic stance and 
> reissue calls on an existing yet broken connection, I will waste time 
> because all I may get at the application level is silence and another 
> expired Deadline. If instead I take a pessimistic stance and assume that 
> the connection is broken (whether or not that actually is the case), I can 
> try a new connection (if it appears that I am online) and minimize the 
> downtime. Doing one retry on the old connection can be a good idea, but if 
> it fails I am still facing the same problem.
>
> The real strategy may be a little more complex, but in the end I still 
> need the ability to request a reconnect when my strategy dictates it (and 
> when the Channel has no indication of recent data received). The current 
> Channel design is overbearing when connections are not reliable.
>
> The deadline mechanism isn't really for connection level usage, it's for 
>> RPC level usage.
>>
>  
> Agreed. From the application level perspective, however, an expired 
> Deadline can simultaneously be an indicator of a broken or unavailable 
> connection, so to take action it is important to have the ability to signal 
> this problem to the connection/channel level right away, without having to 
> wait for a keep alive ping and lack of response at their regularly 
> scheduled intervals. That kind of wait may be fine on the server side, but 
> it's not feasible for my interactive user-facing app. And expired Deadlines 
> can cancel out KeepAlive pings due to no active calls, which exacerbates 
> the problem.
>
> I mean, I can work around the problem on the client side, but for my use 
> case this means potentially issuing very frequent pings and doing so 
> regardless of the existence of open calls. It's not a great design. I 
> wouldn't have to do either of these things if I could signal to the Channel 
> to reconnect when the Deadline expires.
>
> If you can listen for OS updates on Android, why not just kill the RPCs 
>> your self when you get notified?
>>
>
> Good idea - doing more proactive OS listening & killing RPCs could come in 
> useful. My question in this regard was more about how to best kill all 
> existing RPCs in grpc-java while minimizing races, and the general 
> feasibility of attempting to hide the messiness of Channel recreation by 
> using Subchannels.
>

Well, I think I need to eat crow,  killing RPCs is likely not a good idea. 
 Sadly, killing the channel is not a good idea either.  Your options:

* If you kill the RPC, there is no chance for automatic retries.  They are 
an upcoming feature that you probably do want.   
* To answer your specific question, calling channel.shutdownNow() is the 
safest way of killing all RPCs.  It will also kill all connections owned by 
that channel, including ones that were not broken!  Safest but worst option
* Kill the connections.  I don't think such a thing exists today, because 
connection management is harder if we let users have direct access to them. 
 Open to ideas on how to safely expose them. 
 

>
>
> And, even if you did do this, how can you tell if the connection is 
>> failed?  For example, if the OS tells you the antenna is turned off, it may 
>> be temporary and could turn on again with neither endpoint being the wiser. 
>>  The connection is still active.
>>
>
>  Very true, there is no telling if the connection broke.
>
> So in summary I think the ability to signal to the Channel to reconnect is 
> still needed for the Android use case. I recall that one of the official 
> grpc design documents (connection backoff document perhaps) mentions to 
> just continue using the old connection instead if it recovers before the 
> new connection is ready; the same could be true here.
>
> There are already enhancement requests for RPC retries and reconnection 
> backoff improvements, which I also sorely need.
>
> Should I create an issue about the undocumented problem of expiring 
> Deadlines interfering with KeepAlives? Maybe even just to add something to 
> the Javadoc? While logical, I found this quite surprising.
>

Yes, I think so.  It would at least result in docs being made clearer.
 

>
>  
>
>> On Friday, August 11, 2017 at 6:24:48 PM UTC-7, Uli Bubenheimer wrote:
>>>
>>> I am seeing issues with how Deadline works at present in the presence of 
>>> failing connections. What I found in my tests & research regarding my 
>>> Android use case:
>>>
>>>
>>>    - I am missing an option to explicitly fail a connection (signal 
>>>    that the connection is broken) when a Deadline expires. An expired 
>>> Deadline 
>>>    in my Android use case would typically mean that the connection is 
>>> broken, 
>>>    and reconnecting may help.
>>>    - Detecting broken connections with Keep-Alives does not work so 
>>>    well in conjunction with Deadlines. If all RPCs have failed due to 
>>> expired 
>>>    Deadlines caused by a broken connection then using Keep-Alive won't 
>>> detect 
>>>    the broken connection unless I set 
>>>    NettyServerBuilder.permitKeepAliveWithoutCalls(true) and 
>>>    OkHttpChannelBuilder.keepAliveWithoutCalls(true), which judging by the 
>>>    documentation does not seem the best idea.
>>>
>>> Deadlines are useful, for example, when there is a user waiting for a 
>>> response - I have to take action when nothing comes back after 20 or 30 
>>> seconds. Setting KeepAlive to such small values as a workaround would not 
>>> be a good idea.
>>>
>>> I started looking at creating my own LoadBalancer as a workaround, which 
>>> seems less architecturally insane than recreating the Channel. I am 
>>> thinking when I sense a broken connection via an expired Deadline I can 
>>> shutdown() the old Subchannel and create a new Subchannel for the same 
>>> address. I'm not sure how to signal all other open RPCs to error out as if 
>>> connection failure had been detected by Keep-Alive - I'd have that problem 
>>> even with the channel recreation workaround.
>>>
>>> Thoughts?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/581231f6-e27c-4a84-b0b7-54ca96ca1c69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to