Re: A light-weight, versioned client API for Drill

Julian Hyde Thu, 21 Jul 2016 14:01:55 -0700

Avatica has changed a lot since Drill first started using it. Now its default 
wire format is protobuf (driven by the need for something more compact and 
efficient than JSON, and also better at supporting mismatched client/server 
versions) and there are also a bunch of improvements relating to security and 
HA. Lastly, it is released independently of Calcite, so you can pull in a new 
version of Avatica without revving Calcite.


Read the release history to see what features have been added recently: 
http://calcite.apache.org/avatica/docs/history.html 
<http://calcite.apache.org/avatica/docs/history.html>.

Julian



> On Jul 20, 2016, at 6:10 PM, Paul Rogers <[email protected]> wrote:
> 
> Hi Julian,
> 
> Thanks! This is the kind of suggestion I was looking for.
> 
> I did, in fact take a look at Avatica: Drill uses it for the existing JDBC 
> driver. To be honest, I was a bit concerned about the overhead of converting 
> rows to/from JSON. Have you looked at fitting a binary protocol under 
> Avatica? Would sure be great to reuse the work already done to handle the 
> many JDBC complexities.
> 
> - Paul
> 
>> On Jul 20, 2016, at 1:39 PM, Julian Hyde <[email protected]> wrote:
>> 
>> Did you consider Avatica? Identical goals, it works already, and there
>> are clients in several languages.
>> 
>> Julian
>> 
>> 
>> On Wed, Jul 20, 2016 at 10:35 AM, Chunhui Shi <[email protected]> wrote:
>>> Cool. And we know that there are already many 'light weight' APIs soon
>>> become the main stream APIs.
>>> 
>>> On Tue, Jul 19, 2016 at 10:56 PM, Paul Rogers <[email protected]> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> As I’ve been playing with and learning about Drill, it struck me that
>>>> Drill is a wonderful “industrial strength” query engine, but that the
>>>> client API is a bit complex if all an app wants to do is execute a few
>>>> queries. I wondered if we need an adapter between the full-blown Drill
>>>> columnar, asynchronous RPC that Drill uses internally, and the row-based,
>>>> synchronous API that most apps know and love.
>>>> 
>>>> In thinking about a simpler client API, a few items came to mind:
>>>> 
>>>> - We have the JDBC API for Java apps, but the internals of the current
>>>> JDBC use the Drill client and so the JDBC jar is quite big (20MB).
>>>> 
>>>> - The current client API is not versioned, requiring clients to be
>>>> upgraded in lock-step with servers. Many admins, however, find it necessary
>>>> to upgrade clients on a schedule different from that of the server.
>>>> (Imagine upgrading dozens of desktop users at the same time as the Drill
>>>> cluster.) Many of the traditional DB products version their interferes to
>>>> simplify this task.
>>>> 
>>>> - A cool feature of Drill is schema-on-read, which means Drill may
>>>> encounter different schemas as data is read. At present, it is a bit hard
>>>> for clients to consume different schemas. It turns out, however, that
>>>> stored procedures provide something similar (multiple result sets) that we
>>>> could leverage that idea to make schema changes into a first-class feature
>>>> of the API.
>>>> 
>>>> Playing around a bit in my spare time, I found that we can grab lots of
>>>> ideas from “traditional” DB APIs to solve the above problems (and more):
>>>> 
>>>> - A simplified client API provides a row-based view of results, with
>>>> schema changes as a first-class API concept.
>>>> - A “direct" version of the client can sit directly on top of the Drill
>>>> Client, much like the current JDBC driver.
>>>> - Because the client API is simple, it is easy to create a new wire
>>>> protocol to carry the required row-based client messages.
>>>> - That wire protocol enables a very light-weight remote version of the
>>>> client API.
>>>> - A new server implements the server-side of the new wire protocol. The
>>>> server is an adapter: it converts the “retail” row-based API into the
>>>> “wholesale” columnar API of Drill.
>>>> - A new JDBC implementation uses the remote API instead of directly using
>>>> the Drill Client API.
>>>> 
>>>> Because the remote client has no dependencies on Drill (or, indeed,
>>>> anything other than the JDK), it is very small.  Indeed, the revised JDBC
>>>> jar is about 1% of the size of the existing JDBC driver. (200KB instead of
>>>> 20MB.)
>>>> 
>>>> The result is a little prototype project called “Jig”. I’d like to toss it
>>>> out to the community to see if this is something of interest to others. The
>>>> code works just well enough to prove the concept, though I’ve left off the
>>>> more “advanced” data types, multiple cursors per connection, and other
>>>> details.
>>>> 
>>>> The advantage for Java users is a simpler API, smaller JDBC driver, fewer
>>>> dependencies and cross-version compatibility.
>>>> 
>>>> If we add clients in other languages, then just about any language can
>>>> easily query Drill without a Java or ODBC bridge. This would be handy for
>>>> that Caravel integration project discussed here a month or so back. Also
>>>> for data scientists who prefer Python or R.
>>>> 
>>>> In case there is interest in this idea, a more detailed proposal is
>>>> available:
>>>> https://docs.google.com/document/d/1TpJOEUO-DBDGIidOML2_InpJ-fK4yHmsbV5ncqXT6pM
>>>> 
>>>> The code is in a GitHub repo: https://github.com/paul-rogers/drill-jig
>>>> 
>>>> The JIRA for this enhancement: DRILL-4791:
>>>> https://issues.apache.org/jira/browse/DRILL-4791
>>>> 
>>>> This has been a great little learning exercise. Is this something that
>>>> might we might want to take further? Thoughts on the approach taken?
>>>> 
>>>> Thanks,
>>>> 
>>>> - Paul
>>>> 
>>>> 
>>>> 
>

Re: A light-weight, versioned client API for Drill

Reply via email to