Paul Rogers created DRILL-7606:
----------------------------------

             Summary: Support Hive client and JDBC APIs
                 Key: DRILL-7606
                 URL: https://issues.apache.org/jira/browse/DRILL-7606
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.17.0
            Reporter: Paul Rogers


Both Hive and Impala implement the server-side protocol for the [Hive client 
API|https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC].
 Some internals documentation is 
[here|https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Overview]. 
[Thrift message 
definition|https://github.com/apache/hive/blob/master/service-rpc/if/TCLIService.thrift].
 Using the Hive client has a number of advantages:

* Maintained by the Hive and Impala projects, so we benefit from shared 
investment.
* Does not depend on Drill's internals (such as Netty, value vectors, direct 
memory allocation, etc.)
* Already supported by many tools.
* Comes with the "Beeline" command line tool (like SqlLine.)
* The API is versioned, allowing easier client upgrades than Drill's 
unversioned network API.
* Returns data in a row-oriented format better suited to JDBC clients than 
Drill's (potentially large, direct-memory based) value vectors.
* Passes session options along with each query to allow the server to be 
stateless and to allow round-robin distribution of requests to servers.

The Hive API may not be a perfect fit: Hive assumes the existence of a 
metastore such as HMS. Still, this may be a better option than trying to 
improve the existing API.

A pilot approach would be to implement a Thrift server (perhaps borrowing Hive 
code) that turns around and uses the Drill client API to talk to the Drill 
server. If this "shim" server proves the concept, the code can move into the 
Drillbit itself.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to