Lokesh Lingarajan created HUDI-6687:
---------------------------------------
Summary: S3/GCS incr job improvements
Key: HUDI-6687
URL: https://issues.apache.org/jira/browse/HUDI-6687
Project: Apache Hudi
Issue Type: Improvement
Reporter: Lokesh Lingarajan
# Current batched commit string is of the type "commit#key", given this if we
consider the following sample commit
c1->k1
c1->k2
c1->k3
Lets say if the fetch next end exactly at c1#k3, then every fetch following
that we would read entire commit c1 and then ignore.
To solve this we would need another flag inside commit string like
"commit#key#commit_complete_boolean_flag", this "commit_complete_boolean_flag"
will help us avoid is sub optimal fetch in case we end up in the above scenario.
#
[https://github.com/apache/hudi/blob/05ac011316564f97de178b023e8e93ff768c37a4/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java#L183]
- QueryRunner.applyOrdering api call might not be needed as after filtering we
are anyways order it again. Filtering does not need this ordering, we need to
test and remove this ordering call
--
This message was sent by Atlassian Jira
(v8.20.10#820010)