Shubham Chaurasia created HIVE-24058: ----------------------------------------
Summary: Llap external client - Enhancements for running in cloud environment Key: HIVE-24058 URL: https://issues.apache.org/jira/browse/HIVE-24058 Project: Hive Issue Type: Task Components: llap Reporter: Shubham Chaurasia Assignee: Shubham Chaurasia When we query using llap external client library, following happens currently - 1. We first need to get splits using [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226], this just needs Hive server JDBC url. 2. We then submit those splits to llap and obtain record reader to read data using [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140]. In this step we need following at client side - - {{hive.zookeeper.quorum}} -{{hive.llap.daemon.service.hosts}} We need to connect to zk to discover llap daemons. 3. Record reader so obtained needs to [initiate a TCP connection from client to LLAP Daemon to submit the split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185]. 4. It also needs to [initiate another TCP connection from client to output format port in LLAP Daemon to read the data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201]. In cloud based deployments, we may not be able to make direct connections to Zk registry and LLAP daemons from client as it might run outside vpc. For 2, we can move daemon discovery logic to get_splits UDF itself which will run in HS2. For scenarios like 3 and 4, we can expose additional ports on LLAP with additional auth mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005)