[
https://issues.apache.org/jira/browse/HUDI-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6738:
---------------------------------
Labels: pull-request-available (was: )
> Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource
> ----------------------------------------------------------------------------
>
> Key: HUDI-6738
> URL: https://issues.apache.org/jira/browse/HUDI-6738
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Lokesh Lingarajan
> Priority: Major
> Labels: pull-request-available
>
> Recent refactoring to support batching within commit for GCS incr job moved
> the filtering of objects after the checkpoint batching. The issue with this
> on bootstrap scenarios where we are looking for only latest commits, we will
> have to go through the entire set of commits based on sourcelimit instead of
> directly skipping to the latest commit.
> Fix is to apply filtering before we start checkpoint batching. This change
> list will bring GCS job similar to S3 job.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)