[
https://issues.apache.org/jira/browse/HUDI-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516464#comment-17516464
]
Yue Zhang commented on HUDI-3650:
---------------------------------
Based on master branch, There are several places calling this
filterPendingCompactionTimeline API
1. BaseHoodieWriteClient#runTableServicesInline
2. BaseHoodieWriteClient#runAnyPendingCompactions
3. BaseHoodieWriteClient#startCommit
4. RunCompactionActionExecutor#execute
5. SparkRDDWriteClient#compact
6. TimelineDiffHelper#getPendingCompactionTransitions
7. CompactionUtils#getAllPendingCompactionPlans
8. CompactionUtils#getPendingCompactionInstantTimes
9. CompactionUtils#rollbackCompaction
10. CompactionUtils#rollbackEarliestCompaction
11. HoodieFlinkCompactor#compact
12. HoodieInputFormatUtils#filterInstantsTimeline
13. CompactNode#execute
1,2,4,5,9,10,11,13 are all used for compact action.
6,7,8 are all get all pending compaction informations.
3(BaseHoodieWriteClient#startCommit) is used for check start Commit instant
time, guard if there are pending compactions, their instantTime must not be
greater than that of this instant time
Here is the java doc for HoodieInputFormatUtils#filterInstantsTimeline
/**
* Filter any specific instants that we do not want to process.
* example timeline:
*
* t0 -> create bucket1.parquet
* t1 -> create and append updates bucket1.log
* t2 -> request compaction
* t3 -> create bucket2.parquet
*
* if compaction at t2 takes a long time, incremental readers on RO tables
can move to t3 and would skip updates in t1
*
* To workaround this problem, we want to stop returning data belonging to
commits > t2.
* After compaction is complete, incremental reader would see updates in t2,
t3, so on.
* @param timeline
* @return
*/
Overall I believe we are all good for usages of
filterPendingCompactionTimeline() now.
> Revisit all usages of filterPendingCompactionTimeline()
> --------------------------------------------------------
>
> Key: HUDI-3650
> URL: https://issues.apache.org/jira/browse/HUDI-3650
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Yue Zhang
> Priority: Blocker
> Fix For: 0.11.0
>
>
> [https://github.com/apache/hudi/pull/4172/files]
>
> We need to find all usages of filterPendingCompactionTimeline,
> getTimelineOfActions and replace them with new methods.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)