Hi,

A common pattern that I see is having 1 Kafka topic for data change events
and 2 Hudi ingestion job (1 in insert mode and 1 in upsert mode). This
creates 2 tables, 1 with all raw data change events and 1 with the latest
snapshot of data.

What do you guys think about adding support for as an option in
DeltaStreamer?

There are some complications to consider:
- Can we update both tables transactionally? This would be a nice property
to have. The current 2-job pattern does not support this.
- Can we share the Avro logs? This might save some time as well as
achieving the transactionality mentioned above but it increases complexity.

Best,
Minh

Reply via email to