Carl, thanks for your input. In client-server development, especially for mobile, I can control certain things better than others. I have control over the server and can make it very reliable, so that I can mostly rule it out as a point of failure. By contrast, the network is chaotic. Network communication is going over the cellular network in my case, and I have to design for substantial connectivity problems because the app is generally used in remote locations. So usually the blame for communications problems in my case lies on the client-side network.
How can you tell if a connection is broken? Unless you receive a packet > saying the host isn't reachable, its possible the remote endpoint is just > taking a long time. It can't be distinguished from radio silence. > That's the crux of the problem - I usually can't tell if the connection is broken in a timely manner when a Deadline expires. So I have to use good heuristics (make informed guesses) in a way that minimizes the impact on app users. If I take an optimistic stance and reissue calls on an existing yet broken connection, I will waste time because all I may get at the application level is silence and another expired Deadline. If instead I take a pessimistic stance and assume that the connection is broken (whether or not that actually is the case), I can try a new connection (if it appears that I am online) and minimize the downtime. Doing one retry on the old connection can be a good idea, but if it fails I am still facing the same problem. The real strategy may be a little more complex, but in the end I still need the ability to request a reconnect when my strategy dictates it (and when the Channel has no indication of recent data received). The current Channel design is overbearing when connections are not reliable. The deadline mechanism isn't really for connection level usage, it's for > RPC level usage. > Agreed. From the application level perspective, however, an expired Deadline can simultaneously be an indicator of a broken or unavailable connection, so to take action it is important to have the ability to signal this problem to the connection/channel level right away, without having to wait for a keep alive ping and lack of response at their regularly scheduled intervals. That kind of wait may be fine on the server side, but it's not feasible for my interactive user-facing app. And expired Deadlines can cancel out KeepAlive pings due to no active calls, which exacerbates the problem. I mean, I can work around the problem on the client side, but for my use case this means potentially issuing very frequent pings and doing so regardless of the existence of open calls. It's not a great design. I wouldn't have to do either of these things if I could signal to the Channel to reconnect when the Deadline expires. If you can listen for OS updates on Android, why not just kill the RPCs > your self when you get notified? > Good idea - doing more proactive OS listening & killing RPCs could come in useful. My question in this regard was more about how to best kill all existing RPCs in grpc-java while minimizing races, and the general feasibility of attempting to hide the messiness of Channel recreation by using Subchannels. And, even if you did do this, how can you tell if the connection is failed? > For example, if the OS tells you the antenna is turned off, it may be > temporary and could turn on again with neither endpoint being the wiser. > The connection is still active. > Very true, there is no telling if the connection broke. So in summary I think the ability to signal to the Channel to reconnect is still needed for the Android use case. I recall that one of the official grpc design documents (connection backoff document perhaps) mentions to just continue using the old connection instead if it recovers before the new connection is ready; the same could be true here. There are already enhancement requests for RPC retries and reconnection backoff improvements, which I also sorely need. Should I create an issue about the undocumented problem of expiring Deadlines interfering with KeepAlives? Maybe even just to add something to the Javadoc? While logical, I found this quite surprising. > On Friday, August 11, 2017 at 6:24:48 PM UTC-7, Uli Bubenheimer wrote: >> >> I am seeing issues with how Deadline works at present in the presence of >> failing connections. What I found in my tests & research regarding my >> Android use case: >> >> >> - I am missing an option to explicitly fail a connection (signal that >> the connection is broken) when a Deadline expires. An expired Deadline in >> my Android use case would typically mean that the connection is broken, >> and >> reconnecting may help. >> - Detecting broken connections with Keep-Alives does not work so well >> in conjunction with Deadlines. If all RPCs have failed due to expired >> Deadlines caused by a broken connection then using Keep-Alive won't >> detect >> the broken connection unless I set >> NettyServerBuilder.permitKeepAliveWithoutCalls(true) and >> OkHttpChannelBuilder.keepAliveWithoutCalls(true), which judging by the >> documentation does not seem the best idea. >> >> Deadlines are useful, for example, when there is a user waiting for a >> response - I have to take action when nothing comes back after 20 or 30 >> seconds. Setting KeepAlive to such small values as a workaround would not >> be a good idea. >> >> I started looking at creating my own LoadBalancer as a workaround, which >> seems less architecturally insane than recreating the Channel. I am >> thinking when I sense a broken connection via an expired Deadline I can >> shutdown() the old Subchannel and create a new Subchannel for the same >> address. I'm not sure how to signal all other open RPCs to error out as if >> connection failure had been detected by Keep-Alive - I'd have that problem >> even with the channel recreation workaround. >> >> Thoughts? >> > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/d7db1992-57a7-43d3-a490-99a5acb8b85a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
