[ https://issues.apache.org/jira/browse/HIVE-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shubham Chaurasia reassigned HIVE-24058: ---------------------------------------- > Llap external client - Enhancements for running in cloud environment > -------------------------------------------------------------------- > > Key: HIVE-24058 > URL: https://issues.apache.org/jira/browse/HIVE-24058 > Project: Hive > Issue Type: Task > Components: llap > Reporter: Shubham Chaurasia > Assignee: Shubham Chaurasia > Priority: Major > > When we query using llap external client library, following happens currently > - > 1. We first need to get splits using > [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226], > this just needs Hive server JDBC url. > 2. We then submit those splits to llap and obtain record reader to read data > using > [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140]. > In this step we need following at client side - > - {{hive.zookeeper.quorum}} > -{{hive.llap.daemon.service.hosts}} > We need to connect to zk to discover llap daemons. > 3. Record reader so obtained needs to [initiate a TCP connection from client > to LLAP Daemon to submit the > split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185]. > 4. It also needs to [initiate another TCP connection from client to output > format port in LLAP Daemon to read the > data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201]. > In cloud based deployments, we may not be able to make direct connections to > Zk registry and LLAP daemons from client as it might run outside vpc. > For 2, we can move daemon discovery logic to get_splits UDF itself which will > run in HS2. > For scenarios like 3 and 4, we can expose additional ports on LLAP with > additional auth mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005)