[ 
https://issues.apache.org/jira/browse/HIVE-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia reassigned HIVE-24058:
----------------------------------------


> Llap external client - Enhancements for running in cloud environment
> --------------------------------------------------------------------
>
>                 Key: HIVE-24058
>                 URL: https://issues.apache.org/jira/browse/HIVE-24058
>             Project: Hive
>          Issue Type: Task
>          Components: llap
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>
> When we query using llap external client library, following happens currently 
> - 
> 1. We first need to get splits using 
> [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226],
>  this just needs Hive server JDBC url. 
> 2. We then submit those splits to llap and obtain record reader to read data 
> using 
> [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140].
>  In this step we need following at client side -
> - {{hive.zookeeper.quorum}}
> -{{hive.llap.daemon.service.hosts}}
> We need to connect to zk to discover llap daemons.
> 3. Record reader so obtained needs to [initiate a TCP connection from client 
> to LLAP Daemon to submit the 
> split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185].
> 4. It also needs to [initiate another TCP connection from client to output 
> format port in LLAP Daemon to read the 
> data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201].
> In cloud based deployments, we may not be able to make direct connections to 
> Zk registry and LLAP daemons from client as it might run outside vpc. 
> For 2, we can move daemon discovery logic to get_splits UDF itself which will 
> run in HS2.  
> For scenarios like 3 and 4, we can expose additional ports on LLAP with 
> additional auth mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to