This was also an issue I considered for the Avatica Go client when
working on the last release. I also came to the conclusion that while
implementation is not insurmountable, the amount of effort required is
not insignificant.
If you're able to push this out to a service mesh such as Istio or
Netflix's Hystrix (as a library), then it's possible to lean on them for
retrying in the event that a server fails. This does not solve the issue
of increased latency, but hopefully, server failures are so infrequent
that this would not be a huge issue. In most cases, I think choosing the
generic load-balancing option coupled with a service mesh or a client
library for retrying would get you very close to having an ideal
environment.
Francis
On 10/08/2018 6:08 AM, Josh Elser wrote:
The decision of avoiding "routing logic" in the client was that other
systems can do a better job than we can in Avatica. There are other
systems which are specifically designed for doing this -- it's a clear
architectural boundary that says one Avatica client expects to talk to
one Avatica server.
On 8/9/18 1:21 PM, JD Zheng wrote:
Hi, Josh,
Thank you for sharing your experience and the nice writeup in the
hortonworks website too. It’s very helpful. I am just curious why
“not implement routing logic in the client” was one of the original
design goals? Doesn’t it make easier to use Avatica? I agree that the
sharing state between Avatica servers is too much complexity that
does not worth it.
Is the concern of client “smarts” that the retry request most likely
goes to the same server and fails again and thus the over-all
response time will be unnecessarily too long?
-Jiandan
On Aug 9, 2018, at 8:43 AM, Josh Elser <[email protected]> wrote:
Hi Jiandan,
Glad you found my write-up on this. One of the original design goals
was to *not* implement routing logic in the client. Sticky-sessions
is by far the easiest way to implement this.
There is some retry logic in the Avatica client to resubmit requests
when a server responds that it doesn't have a connection/statement
cached that the client thinks it should (e.g. the load balancer
flipped the client to a newer server). I'm still a little concerned
about this level of "smarts" :)
I don't know if there is a fancier solution that we can do in
Avatica. We could consider sharing state between Avatica servers,
but I think it is database-dependent as to whether or not you could
correctly reconstruct an iteration through a result set.
I had talked with a dev on the Apache Hive project. He suggested
that HiveServer2 just fails the query when the client is mid-query
and the server dies (which is reasonably -- servers failing should
be an infrequent operation).
On 8/8/18 8:09 PM, JD Zheng wrote:
Hi,
Our query engine is using calcite as parser/optimizer and
enumerable as runtime if needed to federate different storage
engines. We are trying to enable JDBC access to our query engine.
Everything works smoothly when we only have one calcite/avatica
server.
However, JDBC calls will fail if we run multiple instances of
calcite/avatica servers behind a generic load-balancer. Given that
JDBC server is not stateless, this problem was not a surprise. I
searched around and here are the two options suggested by phoenix
developers
(https://community.hortonworks.com/articles/9377/deploying-the-phoenix-query-server-in-production-e.html
<https://community.hortonworks.com/articles/9377/deploying-the-phoenix-query-server-in-production-e.html>):
1. sticky sessions: make the router to always route a client to a
given server.
2. client-driven routing: implementing Avarice’s protocol which
passes an identifier to the load balancer to control how the
request is routed to the backend servers.
Before we rush into any implementation, we would really appreciate
it if anyone can share experience or thoughts regarding this issue.
Thanks,
-Jiandan