Lokesh Lingarajan created HUDI-6738:
---------------------------------------

             Summary: Apply object filter before checkpoint batching in 
GcsEventsHoodieIncrSource 
                 Key: HUDI-6738
                 URL: https://issues.apache.org/jira/browse/HUDI-6738
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Lokesh Lingarajan


Recent refactoring to support batching within commit for GCS incr job moved the 
filtering of  objects after the checkpoint batching. The issue with this on 
bootstrap scenarios where we are looking for only latest commits, we will have 
to go through the entire set of commits based on sourcelimit instead of 
directly skipping to the latest commit. 

Fix is to apply filtering before we start checkpoint batching. This change list 
will bring GCS job similar to S3 job. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to