Shubham Chaurasia created HIVE-24058:
----------------------------------------
Summary: Llap external client - Enhancements for running in cloud
environment
Key: HIVE-24058
URL: https://issues.apache.org/jira/browse/HIVE-24058
Project: Hive
Issue Type: Task
Components: llap
Reporter: Shubham Chaurasia
Assignee: Shubham Chaurasia
When we query using llap external client library, following happens currently -
1. We first need to get splits using
[LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226],
this just needs Hive server JDBC url.
2. We then submit those splits to llap and obtain record reader to read data
using
[LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140].
In this step we need following at client side -
- {{hive.zookeeper.quorum}}
-{{hive.llap.daemon.service.hosts}}
We need to connect to zk to discover llap daemons.
3. Record reader so obtained needs to [initiate a TCP connection from client to
LLAP Daemon to submit the
split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185].
4. It also needs to [initiate another TCP connection from client to output
format port in LLAP Daemon to read the
data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201].
In cloud based deployments, we may not be able to make direct connections to Zk
registry and LLAP daemons from client as it might run outside vpc.
For 2, we can move daemon discovery logic to get_splits UDF itself which will
run in HS2.
For scenarios like 3 and 4, we can expose additional ports on LLAP with
additional auth mechanism.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)