[
https://issues.apache.org/jira/browse/HUDI-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Govindarajan updated HUDI-1790:
--------------------------------------
Status: In Progress (was: Open)
> Add SqlSource for DeltaStreamer to support backfill use cases
> -------------------------------------------------------------
>
> Key: HUDI-1790
> URL: https://issues.apache.org/jira/browse/HUDI-1790
> Project: Apache Hudi
> Issue Type: New Feature
> Components: DeltaStreamer
> Reporter: Vinoth Govindarajan
> Assignee: Vinoth Govindarajan
> Priority: Major
>
> Delta Streamer is great for incremental workloads, but we need to support
> backfills for use cases like adding a new column and backfill only that
> column for the last 6 months, and if there was a bug in our transformation
> logic and we need to reprocess a couple of older partitions.
>
> If we have a SqlSource as one of the input source to the delta streamer, then
> I can pass any custom Spark SQL queries selecting specific partitions and
> backfill.
>
> When we do the backfill, we don't need to update the last processed commit
> checkpoint, this has to copy the last processed checkpoint before the
> backfill and copy that over to the backfill commit.
>
> cc [~nishith29]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)