Paul Rogers created DRILL-7606:
----------------------------------
Summary: Support Hive client and JDBC APIs
Key: DRILL-7606
URL: https://issues.apache.org/jira/browse/DRILL-7606
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Paul Rogers
Both Hive and Impala implement the server-side protocol for the [Hive client
API|https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC].
Some internals documentation is
[here|https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Overview].
[Thrift message
definition|https://github.com/apache/hive/blob/master/service-rpc/if/TCLIService.thrift].
Using the Hive client has a number of advantages:
* Maintained by the Hive and Impala projects, so we benefit from shared
investment.
* Does not depend on Drill's internals (such as Netty, value vectors, direct
memory allocation, etc.)
* Already supported by many tools.
* Comes with the "Beeline" command line tool (like SqlLine.)
* The API is versioned, allowing easier client upgrades than Drill's
unversioned network API.
* Returns data in a row-oriented format better suited to JDBC clients than
Drill's (potentially large, direct-memory based) value vectors.
* Passes session options along with each query to allow the server to be
stateless and to allow round-robin distribution of requests to servers.
The Hive API may not be a perfect fit: Hive assumes the existence of a
metastore such as HMS. Still, this may be a better option than trying to
improve the existing API.
A pilot approach would be to implement a Thrift server (perhaps borrowing Hive
code) that turns around and uses the Drill client API to talk to the Drill
server. If this "shim" server proves the concept, the code can move into the
Drillbit itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)