[
https://issues.apache.org/jira/browse/HUDI-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lokesh Lingarajan updated HUDI-6687:
------------------------------------
Description:
1.
Current batched commit string is of the type "commit#key", given this if we
consider the following sample commit
c1->k1
c1->k2
c1->k3
Lets say if the fetch next end exactly at c1#k3, then every fetch following
that we would read entire commit c1 and then ignore.
To solve this we would need another flag inside commit string like
"commit#key#commit_complete_boolean_flag", this "commit_complete_boolean_flag"
will help us avoid is sub optimal fetch in case we end up in the above scenario.
2.
[https://github.com/apache/hudi/blob/05ac011316564f97de178b023e8e93ff768c37a4/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java#L183]
- QueryRunner.applyOrdering api call might not be needed as after filtering we
are anyways order it again. Filtering does not need this ordering, we need to
test and remove this ordering call
3.
https://github.com/apache/hudi/pull/9433#discussion_r1291826591
was:
# Current batched commit string is of the type "commit#key", given this if we
consider the following sample commit
c1->k1
c1->k2
c1->k3
Lets say if the fetch next end exactly at c1#k3, then every fetch following
that we would read entire commit c1 and then ignore.
To solve this we would need another flag inside commit string like
"commit#key#commit_complete_boolean_flag", this "commit_complete_boolean_flag"
will help us avoid is sub optimal fetch in case we end up in the above scenario.
#
[https://github.com/apache/hudi/blob/05ac011316564f97de178b023e8e93ff768c37a4/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java#L183]
- QueryRunner.applyOrdering api call might not be needed as after filtering we
are anyways order it again. Filtering does not need this ordering, we need to
test and remove this ordering call
> S3/GCS incr job improvements
> ----------------------------
>
> Key: HUDI-6687
> URL: https://issues.apache.org/jira/browse/HUDI-6687
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Lokesh Lingarajan
> Priority: Minor
>
> 1.
> Current batched commit string is of the type "commit#key", given this if we
> consider the following sample commit
> c1->k1
> c1->k2
> c1->k3
> Lets say if the fetch next end exactly at c1#k3, then every fetch following
> that we would read entire commit c1 and then ignore.
> To solve this we would need another flag inside commit string like
> "commit#key#commit_complete_boolean_flag", this
> "commit_complete_boolean_flag" will help us avoid is sub optimal fetch in
> case we end up in the above scenario.
> 2.
> [https://github.com/apache/hudi/blob/05ac011316564f97de178b023e8e93ff768c37a4/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java#L183]
> - QueryRunner.applyOrdering api call might not be needed as after filtering
> we are anyways order it again. Filtering does not need this ordering, we need
> to test and remove this ordering call
>
> 3.
> https://github.com/apache/hudi/pull/9433#discussion_r1291826591
--
This message was sent by Atlassian Jira
(v8.20.10#820010)