[ 
https://issues.apache.org/jira/browse/HUDI-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

voon updated HUDI-8970:
-----------------------
    Priority: Minor  (was: Major)

> Improve RunCompaction Procedure does not run for all pending compactions when 
> op is scheduleAndExecute
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-8970
>                 URL: https://issues.apache.org/jira/browse/HUDI-8970
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: voon
>            Assignee: voon
>            Priority: Minor
>
> The current op modes for RunCompactionProcedure are as follows:
>  
>  # schedule - schedule a new plan
>  # execute - if specific instants exist, execute them, otherwise execute all 
> pending plans
>  # scheduleandexecute - schedule a new plan and then execute it, if no plan 
> is generated during schedule, execute all pending plans
>  
> While the current implementation of the code holds true to above 
> specification, it is not very user friendly.
>  
> There is no option to schedule a new compaction plan, and execute all pending 
> compaction plans. If a previous `scheduleandexecute` fails the user's 
> scheduled job retries, it will generate a new compaction plan, leaving a 
> pending compaction plan on the table. 
>  
> If there is no user intervention, some log files may grow exponentially, 
> making compaction more computationally expensive. 
>  
> So, this ticket is proposing to change the `scheduleandexecute` op to the 
> following implementation:
>  # schedule a new plan and then execute {color:#de350b}*ALL*{color} pending 
> plans, if no plan is generated during schedule, execute all pending plans
>  
> This would make the RunCompactionProcedure more user-friendly under workflows 
> that are triggered on a set frequency.
>  
> If user would like to control the number of pending plans (to 1) that they 
> would like to execute during ad-hoc runs, they can still do so by using the 
> LIMIT keyword.
>  
> Take note that LIMITING to 1 will not execute the compaction that is just 
> scheduled, instead, it will execute the oldest pending plan.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to