Kong Wei created HUDI-6714:
------------------------------
Summary: HoodieStreamer support only schedule the compaction plan
but not execute the plan
Key: HUDI-6714
URL: https://issues.apache.org/jira/browse/HUDI-6714
Project: Apache Hudi
Issue Type: New Feature
Reporter: Kong Wei
Assignee: Kong Wei
For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction
mode can *async.*
In the async compaction mode, the hoodie-streamer will schedule one compaction
plan after each write operation and execute compaction plan if need. But the
execution of compaction will share the spark job resource, which may cause the
write delay.
In our cases, we want to execute the compaction offline to save the spark
resource for streamer and reduce the write latency. And we found that
scheduling the compaction plan offline will fail while streamer is writing
(means we have to stop the streamer in order to schedule the plan offline). So
we only want the streamer to schedule the compaction but not to execute it.
But currently the streamer seems not support such case. If we set the
`--disable-compaction` to false, the streamer will not schedule the compaction
anymore.
So I want to add a param named `--enable-schedule-compaction` in the streamer,
and we can set `--disable-compaction`=false and
`--enable-schedule-compaction`=true to enable only schedule the compaction in
streamer.
the cases like below:
||param case||schedule plan||execute plan||
|--disable-compaction = true
no matter --enable-schedule-compaction|true|true|
|--disable-compaction = false
--enable-schedule-compaction = true|true|false|
|--disable-compaction = false
--enable-schedule-compaction = false|false|false|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)