Lokesh Lingarajan created HUDI-6738:
---------------------------------------
Summary: Apply object filter before checkpoint batching in
GcsEventsHoodieIncrSource
Key: HUDI-6738
URL: https://issues.apache.org/jira/browse/HUDI-6738
Project: Apache Hudi
Issue Type: Bug
Reporter: Lokesh Lingarajan
Recent refactoring to support batching within commit for GCS incr job moved the
filtering of objects after the checkpoint batching. The issue with this on
bootstrap scenarios where we are looking for only latest commits, we will have
to go through the entire set of commits based on sourcelimit instead of
directly skipping to the latest commit.
Fix is to apply filtering before we start checkpoint batching. This change list
will bring GCS job similar to S3 job.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)