Re: Thinking about Drill 2.0

Paul Rogers Mon, 12 Jun 2017 21:12:07 -0700

Thanks for the suggestions!

The issue is only partly Calcite changes. The real challenge for potential 
contributors is that the Drill storage plugin exposes Calcite mechanisms 
directly. That is, to write storage plugin, one must know (or, more likely, 
experiment to learn) the odd set of calls made to the storage plugin, for a 
group scan, then a sub scan, then this or that. Then, learning those calls, map 
what you want to do to those calls. In some cases, as Calcite chugs along, it 
calls the same methods multiple times, so the plugin writer has to be prepared 
to implement caching to avoid banging on the underlying system multiple times 
for the same data.

The key opportunity here is to observe that the current API is at the 
implementation level: as callbacks from Calcite. (Though, the Drill “easy” 
storage plugin does hide some of the details.) Instead, we’d like an API at the 
definition level: that the plugin simply declares that, say, it can return a 
schema, or can handle certain kinds of filter push-down, etc.

If we can define that API at the metadata (planning) level, then we can create 
an adapter between that API and Calcite. Doing so makes it much easier to test 
the plugin, and isolates the plugin from future code changes as Calcite evolves 
and improves: the adapter changes but not the plugin metadata API.

As you suggest, the resulting definition API would be handy to share between 
projects.

On the execution side, however, Drill plugins are very specific to Drill’s 
operator framework, Drill’s schema-on-read mechanism, Drill’s special columns 
(file metadata, partitions), Drill’s vector “mutators” and so on. Here, any 
synergy would be with Arrow to define a common “mutator” API so that a “row 
batch reader” written for one system should work with the other.

In any case, this kind of sharing is hard to define up front, we might instead 
keep the discussion going to see what works for Drill, what we can abstract 
out, and how we can make the common abstraction work for other systems beyond 
Drill.

Thanks,

- Paul

> On Jun 9, 2017, at 3:38 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> 
>> On Jun 5, 2017, at 11:59 AM, Paul Rogers <prog...@mapr.com> wrote:
>> 
>> Similarly, the storage plugin API exposes details of Calcite (which seems to 
>> evolve with each new version), exposes value vector implementations, and so 
>> on. A cleaner, simpler, more isolated API will allow storage plugins to be 
>> built faster, but will also isolate them from Drill internals changes. 
>> Without isolation, each change to Drill internals would require plugin 
>> authors to update their plugin before Drill can be released.
> 
> Sorry you’re getting burned by Calcite changes. We try to minimize impact, 
> but sometimes it’s difficult to see what you’re breaking.
> 
> I like the goal of a stable storage plugin API. Maybe it’s something Drill 
> and Calcite can collaborate on? Much of the DNA of an adapter is independent 
> of the engine that will consume the data (Drill or otherwise) - it concerns 
> how to create a connection, getting metadata, and pushing down logical 
> operations, and generating queries in the target system’s query language. 
> Calcite and Drill ought to be able to share that part, rather than 
> maintaining separate collections of adapters.
> 
> Julian
>

Re: Thinking about Drill 2.0

Reply via email to