Re: [DISCUSS] Hudi Reverse Streamer

Prashant Wason Thu, 30 Mar 2023 21:39:53 -0700

Could be useful. Also, may be useful for backup / replication scenario
(keeping a copy of data in alternate/cloud DC).


HoodieDeltaStreamer already has the concept of "sources". This can be
implemented as a "sink" concept.

On Thu, Mar 30, 2023 at 8:12 PM Vinoth Chandar <[email protected]> wrote:

> Essentially.
>
> Old architecture :    (operational database) ==> some tool ==> (data
> warehouse raw data) ==> SQL ETL ==> (data warehouse derived data)
>
> New architecture : (operational database) ==> Hudi delta Streamer ==> (Hudi
> raw data) ==> Spark/Flink Hudi ETL ==> (Hudi derived data) ==> Hudi Reverse
> Streamer ==> (Data Warehouse/Kafka/Operational Database)
>
> On Thu, Mar 30, 2023 at 8:09 PM Vinoth Chandar <[email protected]> wrote:
>
> > Hi all,
> >
> > Any interest in building a reverse streaming tool, that does the reverse
> > of what the DeltaStreamer tool does? It will read Hudi table
> incrementally
> > (only source) and write out the data to a variety of sinks - Kafka, JDBC
> > Databases, DFS.
> >
> > This has come up many times with data warehouse users. Often times, they
> > want to use Hudi to speed up or reduce costs on their data ingestion and
> > ETL (using Spark/Flink), but want to move the derived data back into a
> data
> > warehouse or an operational database for serving.
> >
> > What do you all think?
> >
> > Thanks
> > Vinoth
> >
>

Re: [DISCUSS] Hudi Reverse Streamer

Reply via email to