Thomas Tauber-Marshall created KNOX-2054:
--------------------------------------------
Summary: Accomodate APIs with multiple endpoints
Key: KNOX-2054
URL: https://issues.apache.org/jira/browse/KNOX-2054
Project: Apache Knox
Issue Type: Improvement
Reporter: Thomas Tauber-Marshall
A service definition was recently added to support accessing Impala via
Hiveserver2.
One issue here is that Impala has historically generally used multiple
"coordinators", nodes that can accept queries (originally, all nodes acted as
both a coordinator and executor, though Impala has moved away from that to
dedicated coordinators/executors recently for scalability reasons)
In Knox, when you create a topology you can specify multiple urls for a single
service, but my understanding is that Knox will only ever direct requests to
one of those urls, unless it becomes unavailable and then another url will be
choosen to direct all requests to.
This poses a problem for Impala, as it makes it difficult to support multiple
coordinators. Hive also has a similar problem.
There are at least two ways to work around this currently in Knox with only
configuration changes:
- Users can set up multiple topologies, one for each Impala coordinator.
- The Impala service definition can be modified to allow specifying a hostname
to direct requests to, the way many UI service definitions allow "?host=..."
parameter.
Both of these are ugly hacks and a better solution is needed.
One potential solution would be to add some sort of load balancing
functionality to Knox. Then, if a service has multiple urls specified in the
topology requests could be directed to any one of those urls transparently.
One complication is that Impala clients would need to have sticky connections -
session and query state is not shared between coordinators in Impala, so a
client that connects to a particular coordinator would need to have all of
their requests be directed to that same coordinator for the duration of their
session.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)