The correct approach might be to improve DataFusion support in
delta-rs. TableProvider is already implemented here:
https://github.com/delta-io/delta-rs/blob/main/rust/src/delta_datafusion.rs

I've pinged QP to ask for their advice.

Neville

On Wed, 9 Jun 2021 at 19:58, Andrew Lamb <al...@influxdata.com> wrote:

> I think the idea of DataFusion + DeltaLake is quite compelling and likely
> useful.
>
> However, I think DataFusion is ideally an  "embeddable query engine" rather
> than a database system in itself, so in that mental model Delta Lake
> integration belongs somewhere other than the core DataFusion crate.
>
> My ideal structure would be a new crate (maybe not even part of the Apache
> Arrow Project), perhaps called `datafusion-delta-rs`, that contained the
> TableProvider and whatever else was needed to integrate DataFusion with
> DeltaLake
>
> This structure could also start a pattern of publishing plugins for
> DataFusion separately from the core.
>
> Andrew
> p.s. now that Arrow is publishing more incrementally (e.g. 4.1.0, 4.2.0,
> etc), I think delta-rs[1] and datafusion both only specify `4.x` so they
> should work together nicely
>
> https://github.com/delta-io/delta-rs/blame/main/rust/Cargo.toml
>
> On Wed, Jun 9, 2021 at 2:29 AM Daniël Heres <danielhe...@gmail.com> wrote:
>
> > Hi all,
> >
> > I would like to receive some feedback about adding Delta Lake support to
> > DataFusion (https://github.com/apache/arrow-datafusion/issues/525).
> > As you might know, Delta Lake <https://delta.io/> is a format adding
> > features like ACID transactions, statistics, and storage optimization to
> > Parquet and is getting quite some traction for managing data lakes.
> > It seems a great feature to have in DataFusion as well.
> >
> > The delta-rs <https://github.com/delta-io/delta-rs> project provides a
> > native, Apache licensed, Rust implementation of Delta Lake, already
> > supporting a large part of the format and operations.
> >
> > The first integration I would like to propose is adding read support via
> a
> > new TableProvider. There might be some work to do around dependencies as
> > both DataFusion and delta-rs rely on (certain versions of) Arrow and
> > Parquet.
> >
> > Let me know if you have any further ideas or concerns.
> >
> > Best regards,
> >
> > Daniël Heres
> >
>

Reply via email to