Processor for Delta Lake / Real Time Data Warehouse

Martin Ebert Sat, 01 Feb 2020 04:58:03 -0800

Hi community,
how can we drive this topic forward? Jira is created:
https://issues.apache.org/jira/projects/NIFI/issues/NIFI-6976


"A table in Delta Lake is both a batch table, as well as a streaming source
and sink. Streaming data ingest, batch historic backfill, and interactive
queries all just work out of the box." (Delta.io)

This is the decisive argument for me. A very impressive technological
milestone that is just crying out to be implemented in Nifi. You find all
details in the video here: https://youtu.be/VLd_qOrKrTI

Delta Lake is related to Databricks, Athena and Presto. In our case it
would be great to extract data from a database or any other source (can be
streaming) and send this data or stream to our Databricks cluster.

I imagine it just like in the video. You have a Delta Lake processor, where
you can define to which Databricks cluster the data should go and which
Delta Lake operation (upsert, merge, delete, ...) should happen with the
data. That means Databricks is only the executing component and I don't
have to code in Databricks in notebooks anymore. I also find the
possibility to request an extra cluster with the processor cool.

Being able to to the same with Athena and Presto would be a dream!

Processor for Delta Lake / Real Time Data Warehouse

Reply via email to