[
https://issues.apache.org/jira/browse/HIVE-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shubham Chaurasia reassigned HIVE-24058:
----------------------------------------
> Llap external client - Enhancements for running in cloud environment
> --------------------------------------------------------------------
>
> Key: HIVE-24058
> URL: https://issues.apache.org/jira/browse/HIVE-24058
> Project: Hive
> Issue Type: Task
> Components: llap
> Reporter: Shubham Chaurasia
> Assignee: Shubham Chaurasia
> Priority: Major
>
> When we query using llap external client library, following happens currently
> -
> 1. We first need to get splits using
> [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226],
> this just needs Hive server JDBC url.
> 2. We then submit those splits to llap and obtain record reader to read data
> using
> [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140].
> In this step we need following at client side -
> - {{hive.zookeeper.quorum}}
> -{{hive.llap.daemon.service.hosts}}
> We need to connect to zk to discover llap daemons.
> 3. Record reader so obtained needs to [initiate a TCP connection from client
> to LLAP Daemon to submit the
> split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185].
> 4. It also needs to [initiate another TCP connection from client to output
> format port in LLAP Daemon to read the
> data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201].
> In cloud based deployments, we may not be able to make direct connections to
> Zk registry and LLAP daemons from client as it might run outside vpc.
> For 2, we can move daemon discovery logic to get_splits UDF itself which will
> run in HS2.
> For scenarios like 3 and 4, we can expose additional ports on LLAP with
> additional auth mechanism.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)