[ 
https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amruth S updated HIVE-18338:
----------------------------
    Description: 
This exposes async API in HiveStatement (jdbc module)

The JDBC interface always have had strict synchronous APIs. 
So the hive JDBC implementation also had to follow it though the hive server is 
fully asynchronous.

Developers trying to build proxies on top of hive servers end up writing thrift 
client from scratch to make it asynchronous and robust to its restarts.
The common pattern is
 # Submit query, get operation handle and store in a persistent store
 # Poll and wait for completion
 # Stream results
 # In the event of restarts, restore OperationHandle from persistent store and 
continue execution.

The patch does 2 things
 * exposes operation handle (once a query is submitted) 
{{getOperationhandle()}} 
Developers can persist this along with the actual hive server url {{getJdbcUrl}}
 * latch APIs 
Developers can create a statement and latch on to an operation handle that was 
persisted earlier. For latch, the statement should be created from the actual 
hive server URI connection in which the query was submitted.

  was:
Lot of users are struggling and rewriting a lot of boiler plate over thrift to 
get pure asynchronous capability. 

The idea is to expose operation handle, so that clients can persist it and 
later can latch on to the same execution.

*Problem statement*

Hive JDBC currently exposes 2 methods related to asynchronous execution
*executeAsync()* - to trigger a query execution and return immediately.
*waitForOperationToComplete()* - which waits till the current execution is 
complete *blocking the user thread*.

This has one problem

If the client process goes down, there is no way to resume queries although 
hive server is completely asynchronous.
*Proposal*

If operation handle could be exposed, we can latch on to an active execution of 
a query.

*Code changes*

Operation handle is exposed. So client can keep a copy.
latchSync() and latchAsync() methods take an operation handle and try to latch 
on to the current execution in hive server if present


> [Client, JDBC] Expose async interface through hive JDBC.
> --------------------------------------------------------
>
>                 Key: HIVE-18338
>                 URL: https://issues.apache.org/jira/browse/HIVE-18338
>             Project: Hive
>          Issue Type: Improvement
>          Components: Clients, JDBC
>    Affects Versions: 2.3.2
>            Reporter: Amruth S
>            Assignee: Amruth S
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-18338.patch, HIVE-18338.patch.1, 
> HIVE-18338.patch.2, HIVE-18338.patch.3
>
>
> This exposes async API in HiveStatement (jdbc module)
> The JDBC interface always have had strict synchronous APIs. 
> So the hive JDBC implementation also had to follow it though the hive server 
> is fully asynchronous.
> Developers trying to build proxies on top of hive servers end up writing 
> thrift client from scratch to make it asynchronous and robust to its restarts.
> The common pattern is
>  # Submit query, get operation handle and store in a persistent store
>  # Poll and wait for completion
>  # Stream results
>  # In the event of restarts, restore OperationHandle from persistent store 
> and continue execution.
> The patch does 2 things
>  * exposes operation handle (once a query is submitted) 
> {{getOperationhandle()}} 
> Developers can persist this along with the actual hive server url 
> {{getJdbcUrl}}
>  * latch APIs 
> Developers can create a statement and latch on to an operation handle that 
> was persisted earlier. For latch, the statement should be created from the 
> actual hive server URI connection in which the query was submitted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to