Carl, thanks for your input.

In client-server development, especially for mobile, I can control certain 
things better than others. I have control over the server and can make it 
very reliable, so that I can mostly rule it out as a point of failure. By 
contrast, the network is chaotic. Network communication is going over the 
cellular network in my case, and I have to design for substantial 
connectivity problems because the app is generally used in remote 
locations. So usually the blame for communications problems in my case lies 
on the client-side network.

How can you tell if a connection is broken?  Unless you receive a packet 
> saying the host isn't reachable, its possible the remote endpoint is just 
> taking a long time.  It can't be distinguished from radio silence.
>

That's the crux of the problem - I usually can't tell if the connection is 
broken in a timely manner when a Deadline expires. So I have to use good 
heuristics (make informed guesses) in a way that minimizes the impact on 
app users. If I take an optimistic stance and reissue calls on an existing 
yet broken connection, I will waste time because all I may get at the 
application level is silence and another expired Deadline. If instead I 
take a pessimistic stance and assume that the connection is broken (whether 
or not that actually is the case), I can try a new connection (if it 
appears that I am online) and minimize the downtime. Doing one retry on the 
old connection can be a good idea, but if it fails I am still facing the 
same problem.

The real strategy may be a little more complex, but in the end I still need 
the ability to request a reconnect when my strategy dictates it (and when 
the Channel has no indication of recent data received). The current Channel 
design is overbearing when connections are not reliable.

The deadline mechanism isn't really for connection level usage, it's for 
> RPC level usage.
>
 
Agreed. From the application level perspective, however, an expired 
Deadline can simultaneously be an indicator of a broken or unavailable 
connection, so to take action it is important to have the ability to signal 
this problem to the connection/channel level right away, without having to 
wait for a keep alive ping and lack of response at their regularly 
scheduled intervals. That kind of wait may be fine on the server side, but 
it's not feasible for my interactive user-facing app. And expired Deadlines 
can cancel out KeepAlive pings due to no active calls, which exacerbates 
the problem.

I mean, I can work around the problem on the client side, but for my use 
case this means potentially issuing very frequent pings and doing so 
regardless of the existence of open calls. It's not a great design. I 
wouldn't have to do either of these things if I could signal to the Channel 
to reconnect when the Deadline expires.

If you can listen for OS updates on Android, why not just kill the RPCs 
> your self when you get notified?
>

Good idea - doing more proactive OS listening & killing RPCs could come in 
useful. My question in this regard was more about how to best kill all 
existing RPCs in grpc-java while minimizing races, and the general 
feasibility of attempting to hide the messiness of Channel recreation by 
using Subchannels.

And, even if you did do this, how can you tell if the connection is failed? 
>  For example, if the OS tells you the antenna is turned off, it may be 
> temporary and could turn on again with neither endpoint being the wiser. 
>  The connection is still active.
>

 Very true, there is no telling if the connection broke.

So in summary I think the ability to signal to the Channel to reconnect is 
still needed for the Android use case. I recall that one of the official 
grpc design documents (connection backoff document perhaps) mentions to 
just continue using the old connection instead if it recovers before the 
new connection is ready; the same could be true here.

There are already enhancement requests for RPC retries and reconnection 
backoff improvements, which I also sorely need.

Should I create an issue about the undocumented problem of expiring 
Deadlines interfering with KeepAlives? Maybe even just to add something to 
the Javadoc? While logical, I found this quite surprising.

 

> On Friday, August 11, 2017 at 6:24:48 PM UTC-7, Uli Bubenheimer wrote:
>>
>> I am seeing issues with how Deadline works at present in the presence of 
>> failing connections. What I found in my tests & research regarding my 
>> Android use case:
>>
>>
>>    - I am missing an option to explicitly fail a connection (signal that 
>>    the connection is broken) when a Deadline expires. An expired Deadline in 
>>    my Android use case would typically mean that the connection is broken, 
>> and 
>>    reconnecting may help.
>>    - Detecting broken connections with Keep-Alives does not work so well 
>>    in conjunction with Deadlines. If all RPCs have failed due to expired 
>>    Deadlines caused by a broken connection then using Keep-Alive won't 
>> detect 
>>    the broken connection unless I set 
>>    NettyServerBuilder.permitKeepAliveWithoutCalls(true) and 
>>    OkHttpChannelBuilder.keepAliveWithoutCalls(true), which judging by the 
>>    documentation does not seem the best idea.
>>
>> Deadlines are useful, for example, when there is a user waiting for a 
>> response - I have to take action when nothing comes back after 20 or 30 
>> seconds. Setting KeepAlive to such small values as a workaround would not 
>> be a good idea.
>>
>> I started looking at creating my own LoadBalancer as a workaround, which 
>> seems less architecturally insane than recreating the Channel. I am 
>> thinking when I sense a broken connection via an expired Deadline I can 
>> shutdown() the old Subchannel and create a new Subchannel for the same 
>> address. I'm not sure how to signal all other open RPCs to error out as if 
>> connection failure had been detected by Keep-Alive - I'd have that problem 
>> even with the channel recreation workaround.
>>
>> Thoughts?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/d7db1992-57a7-43d3-a490-99a5acb8b85a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to