Kong Wei created HUDI-6714:
------------------------------

             Summary: HoodieStreamer support only schedule the compaction plan 
but not execute the plan
                 Key: HUDI-6714
                 URL: https://issues.apache.org/jira/browse/HUDI-6714
             Project: Apache Hudi
          Issue Type: New Feature
            Reporter: Kong Wei
            Assignee: Kong Wei


For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction 
mode can *async.*

In the async compaction mode, the hoodie-streamer will schedule one compaction 
plan after each write operation and execute compaction plan if need. But the 
execution of compaction will share the spark job resource, which may cause the 
write delay.

In our cases, we want to execute the compaction offline to save the spark 
resource for streamer and reduce the write latency. And we found that 
scheduling the compaction plan offline will fail while streamer is writing 
(means we have to stop the streamer in order to schedule the plan offline). So 
we only want the streamer to schedule the compaction but not to execute it.

But currently the streamer seems not support such case. If we set the 
`--disable-compaction` to false, the streamer will not schedule the compaction 
anymore.

So I want to add a param named `--enable-schedule-compaction` in the streamer, 
and we can set `--disable-compaction`=false and 
`--enable-schedule-compaction`=true to enable only schedule the compaction in 
streamer.

the cases like below:
||param case||schedule plan||execute plan||
|--disable-compaction = true
no matter --enable-schedule-compaction|true|true|
|--disable-compaction = false
--enable-schedule-compaction = true|true|false|
|--disable-compaction = false
--enable-schedule-compaction = false|false|false|

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to