Re: "Crude-but-effective" Arrow integration

Ted Dunning Mon, 20 Aug 2018 09:41:56 -0700

Inline.

On Mon, Aug 20, 2018 at 9:20 AM Paul Rogers <[email protected]>
wrote:

> ...
> By contrast, migrating Drill internals to Arrow has always been seen as
> the bulk of the cost; costs which the "crude-but-effective" suggestion
> seeks to avoid. Some of the full-integration costs include:
>
> * Reworking Drill's direct memory model to work with Arrow's.
>

This should be relatively isolated to the allocation/deallocation code. The
deallocation should become a no-op. The allocation becomes simpler and
safer.

> * Changing all low-level runtime code that works with vectors to instead
> work with Arrow vectors.
>

Why? You already said that most code doesn't have to change since the
format is the same.

> * Change all Drill's vector metadata, and code that uses that metadata, to
> use Arrow's metadata instead.
>

Why? You said that converting Arrow metadata to Drill's metadata would be
simple. Why not just continue with that?

> * Since generated code works directly with vectors, change all the code
> generation.
>

Why? You said the UDFs would just work.

> * Since Drill vectors and metadata are exposed via the Drill client to
> JDBC and ODBC, those must be revised as well.
>

How much given the high level of compatibility?

> * Since the wire format will change, clients of Drill must upgrade their
> JDBC/ODBC drivers when migrating to an Arrow-based Drill.
>

Doesn't this have to happen fairly often anyway?

Perhaps this would be a good excuse for a 2.0 step.

Re: "Crude-but-effective" Arrow integration

Reply via email to