Lokesh Lingarajan created HUDI-6687:
---------------------------------------

             Summary: S3/GCS incr job improvements
                 Key: HUDI-6687
                 URL: https://issues.apache.org/jira/browse/HUDI-6687
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Lokesh Lingarajan


# Current batched commit string is of the type "commit#key", given this if we 
consider the following sample commit
c1->k1
c1->k2
c1->k3

Lets say if the fetch next end exactly at c1#k3, then every fetch following 
that we would read entire commit c1 and then ignore. 

To solve this we would need another flag inside commit string like
"commit#key#commit_complete_boolean_flag", this "commit_complete_boolean_flag" 
will help us avoid is sub optimal fetch in case we end up in the above scenario.


 # 
[https://github.com/apache/hudi/blob/05ac011316564f97de178b023e8e93ff768c37a4/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java#L183]
 - QueryRunner.applyOrdering api call might not be needed as after filtering we 
are anyways order it again. Filtering does not need this ordering, we need to 
test and remove this ordering call



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to