[
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242862#comment-15242862
]
Takanobu Asanuma commented on HIVE-11527:
-----------------------------------------
Hi, [~sershe], [~vgumashta], and other experts.
I uploaded a new patch in Review Board just now. I think I have almost finished
implementing the features. So I'd like to explain the summary of all my
implementation.
*How to use the bypass*
When {{hive.server2.webhdfs.bypass.enabled}} is true, users can use the bypass.
The default is false.
*Tests*
I added some unit tests in {{TestJdbcWithMiniHS2}}, {{TestJdbcWithMiniMr}} and
{{TestJdbcWithMiniHA}}. They will help debugging.
*Changing thrift API*
I added three optional variables as the response from HS2 to JDBC drivers after
executing a query.
* {{finalDirUri}}: a pass of the directory which has the final data
* {{haConf}}: configurations for Namenode HA
* {{typeName}}: a type name for complex columns
*Decoding data*
Decoding data in clients side is implemented in {{HiveQueryResultSet}}. In the
latest patch, to avoid complex codes, clients can use the bypass only when the
final data is SequenceFile which is the default format of final data. I think
it is rare that clients change the default format.
*Handling HA*
When Namenode is HA, clients need some configurations which are in the cluster
side. They are passed in {{Driver#getFinalDirName}}.
*Unable to use the bypass*
In some cases, it is difficult to use the bypass. I wrote the cases in
{{TestJdbcWithMiniHS2#testUnableUseBypassCase}}. {{Driver#useBypass}} judges
whether clients use the bypass.
Some optimizations and bugs may remain. Please review the patch when you are
free.
Thank you very much for reading this long comment!
> bypass HiveServer2 thrift interface for query results
> -----------------------------------------------------
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Reporter: Sergey Shelukhin
> Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS
> link?) and for the user to read the results directly off HDFS inside the
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)