[
https://issues.apache.org/jira/browse/HUDI-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kong Wei updated HUDI-6714:
---------------------------
Description:
For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction
mode can *async.*
In the async compaction mode, the hoodie-streamer will schedule one compaction
plan after each write operation and execute compaction plan if need. But the
execution of compaction will share the spark job resource, which may cause the
write delay.
In our cases, we want to execute the compaction offline to save the spark
resource for streamer and reduce the write latency. And we found that
scheduling the compaction plan offline will fail while streamer is writing
(means we have to stop the streamer in order to schedule the plan offline). So
we only want the streamer to schedule the compaction but not to execute it.
But currently the streamer seems not support such case. If we set the
`--disable-compaction` to false, the streamer will not schedule the compaction
anymore.
So I want to add a param named --{_}enable-schedule-compaction{_} in the
streamer, and we can set {_}disable-compaction{_}=false and
{_}enable-schedule-compaction{_}=true to enable only schedule the compaction in
streamer.
the cases like below:
||param case||schedule plan||execute plan||
|--disable-compaction = true
no matter --enable-schedule-compaction|true|true|
|--disable-compaction = false
--enable-schedule-compaction = true|true|false|
|--disable-compaction = false
--enable-schedule-compaction = false|false|false|
was:
For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction
mode can *async.*
In the async compaction mode, the hoodie-streamer will schedule one compaction
plan after each write operation and execute compaction plan if need. But the
execution of compaction will share the spark job resource, which may cause the
write delay.
In our cases, we want to execute the compaction offline to save the spark
resource for streamer and reduce the write latency. And we found that
scheduling the compaction plan offline will fail while streamer is writing
(means we have to stop the streamer in order to schedule the plan offline). So
we only want the streamer to schedule the compaction but not to execute it.
But currently the streamer seems not support such case. If we set the
`--disable-compaction` to false, the streamer will not schedule the compaction
anymore.
So I want to add a param named `--enable-schedule-compaction` in the streamer,
and we can set `--disable-compaction`=false and
`--enable-schedule-compaction`=true to enable only schedule the compaction in
streamer.
the cases like below:
||param case||schedule plan||execute plan||
|--disable-compaction = true
no matter --enable-schedule-compaction|true|true|
|--disable-compaction = false
--enable-schedule-compaction = true|true|false|
|--disable-compaction = false
--enable-schedule-compaction = false|false|false|
> HoodieStreamer support only schedule the compaction plan but not execute the
> plan
> ---------------------------------------------------------------------------------
>
> Key: HUDI-6714
> URL: https://issues.apache.org/jira/browse/HUDI-6714
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: Kong Wei
> Assignee: Kong Wei
> Priority: Major
>
> For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction
> mode can *async.*
> In the async compaction mode, the hoodie-streamer will schedule one
> compaction plan after each write operation and execute compaction plan if
> need. But the execution of compaction will share the spark job resource,
> which may cause the write delay.
> In our cases, we want to execute the compaction offline to save the spark
> resource for streamer and reduce the write latency. And we found that
> scheduling the compaction plan offline will fail while streamer is writing
> (means we have to stop the streamer in order to schedule the plan offline).
> So we only want the streamer to schedule the compaction but not to execute it.
> But currently the streamer seems not support such case. If we set the
> `--disable-compaction` to false, the streamer will not schedule the
> compaction anymore.
> So I want to add a param named --{_}enable-schedule-compaction{_} in the
> streamer, and we can set {_}disable-compaction{_}=false and
> {_}enable-schedule-compaction{_}=true to enable only schedule the compaction
> in streamer.
> the cases like below:
> ||param case||schedule plan||execute plan||
> |--disable-compaction = true
> no matter --enable-schedule-compaction|true|true|
> |--disable-compaction = false
> --enable-schedule-compaction = true|true|false|
> |--disable-compaction = false
> --enable-schedule-compaction = false|false|false|
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)