Vinoth Govindarajan created HUDI-1790:
-----------------------------------------
Summary: Add SqlSource for DeltaStreamer to support backfill use
cases
Key: HUDI-1790
URL: https://issues.apache.org/jira/browse/HUDI-1790
Project: Apache Hudi
Issue Type: New Feature
Components: DeltaStreamer
Reporter: Vinoth Govindarajan
Assignee: Vinoth Govindarajan
Delta Streamer is great for incremental workloads, but we need to support
backfills for use cases like adding a new column and backfill only that column
for the last 6 months, and if there was a bug in our transformation logic and
we need to reprocess a couple of older partitions.
If we have a SqlSource as one of the input source to the delta streamer, then I
can pass any custom Spark SQL queries selecting specific partitions and
backfill.
When we do the backfill, we don't need to update the last processed commit
checkpoint, this has to copy the last processed checkpoint before the backfill
and copy that over to the backfill commit.
cc [~nishith29]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)