[
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015634#comment-15015634
]
Takanobu Asanuma commented on HIVE-11527:
-----------------------------------------
I'd like to share my thoughts for this work.
I'm going to add a new thrift api to get a uri of the result data. New data
flow is like below:
1. When JDBC client calls HiveQueryResultSet#next(), JDBC driver calls the new
thrift API.
2. HiveServer2 returns the path of the result data. IIUC, FetchWork has the uri.
NOTE: Like "select * from tablename limit 10", some queries don't run
MR/Tez/Spark jobs and the uri is different from the final data. In this case,
we use current implementation.
3. JDBC driver gets the uri and downloads the data by WebHDFS.
4. JDBC driver decodes the data and creates RowSet.
I'm writing codes and I will upload a wip patch next week. If you have any
thoughts on this jira, please share it with me.
> bypass HiveServer2 thrift interface for query results
> -----------------------------------------------------
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Reporter: Sergey Shelukhin
> Assignee: Takanobu Asanuma
>
> Right now, HS2 reads query results and returns them to the caller via its
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS
> link?) and for the user to read the results directly off HDFS inside the
> cluster, or via something like WebHDFS outside the cluster
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)