Thomas Tauber-Marshall created KNOX-2054:
--------------------------------------------

             Summary: Accomodate APIs with multiple endpoints
                 Key: KNOX-2054
                 URL: https://issues.apache.org/jira/browse/KNOX-2054
             Project: Apache Knox
          Issue Type: Improvement
            Reporter: Thomas Tauber-Marshall


A service definition was recently added to support accessing Impala via 
Hiveserver2.

One issue here is that Impala has historically generally used multiple 
"coordinators", nodes that can accept queries (originally, all nodes acted as 
both a coordinator and executor, though Impala has moved away from that to 
dedicated coordinators/executors recently for scalability reasons)

In Knox, when you create a topology you can specify multiple urls for a single 
service, but my understanding is that Knox will only ever direct requests to 
one of those urls, unless it becomes unavailable and then another url will be 
choosen to direct all requests to.

This poses a problem for Impala, as it makes it difficult to support multiple 
coordinators. Hive also has a similar problem.

There are at least two ways to work around this currently in Knox with only 
configuration changes:
- Users can set up multiple topologies, one for each Impala coordinator.
- The Impala service definition can be modified to allow specifying a hostname 
to direct requests to, the way many UI service definitions allow "?host=..." 
parameter.
Both of these are ugly hacks and a better solution is needed.

One potential solution would be to add some sort of load balancing 
functionality to Knox. Then, if a service has multiple urls specified in the 
topology requests could be directed to any one of those urls transparently.

One complication is that Impala clients would need to have sticky connections - 
session and query state is not shared between coordinators in Impala, so a 
client that connects to a particular coordinator would need to have all of 
their requests be directed to that same coordinator for the duration of their 
session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to