[ 
https://issues.apache.org/jira/browse/HUDI-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kong Wei updated HUDI-6714:
---------------------------
    Description: 
For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction 
mode can *async.*

In the async compaction mode, the hoodie-streamer will schedule one compaction 
plan after each write operation and execute compaction plan if need. But the 
execution of compaction will share the spark job resource, which may cause the 
write delay.

In our cases, we want to execute the compaction offline to save the spark 
resource for streamer and reduce the write latency. And we found that 
scheduling the compaction plan offline will fail while streamer is writing 
(means we have to stop the streamer in order to schedule the plan offline). So 
we only want the streamer to schedule the compaction but not to execute it.

But currently the streamer seems not support such case. If we set the 
`--disable-compaction` to false, the streamer will not schedule the compaction 
anymore.

So I want to add a param named --{_}enable-schedule-compaction{_} in the 
streamer,

and we can set --{_}disable-compaction{_}=false and 
{_}enable-schedule-compaction{_}=true to enable only schedule the compaction in 
streamer.

the cases like below:
||param case||schedule plan||execute plan||
|--disable-compaction = true
no matter --enable-schedule-compaction|true|true|
|--disable-compaction = false
--enable-schedule-compaction = true|true|false|
|--disable-compaction = false
--enable-schedule-compaction = false|false|false|

 

  was:
For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction 
mode can *async.*

In the async compaction mode, the hoodie-streamer will schedule one compaction 
plan after each write operation and execute compaction plan if need. But the 
execution of compaction will share the spark job resource, which may cause the 
write delay.

In our cases, we want to execute the compaction offline to save the spark 
resource for streamer and reduce the write latency. And we found that 
scheduling the compaction plan offline will fail while streamer is writing 
(means we have to stop the streamer in order to schedule the plan offline). So 
we only want the streamer to schedule the compaction but not to execute it.

But currently the streamer seems not support such case. If we set the 
`--disable-compaction` to false, the streamer will not schedule the compaction 
anymore.

So I want to add a param named --{_}enable-schedule-compaction{_} in the 
streamer, and we can set {_}disable-compaction{_}=false and 
{_}enable-schedule-compaction{_}=true to enable only schedule the compaction in 
streamer.

the cases like below:
||param case||schedule plan||execute plan||
|--disable-compaction = true
no matter --enable-schedule-compaction|true|true|
|--disable-compaction = false
--enable-schedule-compaction = true|true|false|
|--disable-compaction = false
--enable-schedule-compaction = false|false|false|

 


> HoodieStreamer support only schedule the compaction plan but not execute the 
> plan
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-6714
>                 URL: https://issues.apache.org/jira/browse/HUDI-6714
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Kong Wei
>            Assignee: Kong Wei
>            Priority: Major
>
> For HoodieStreamer(aka HoodieDeltaStreamer) writing MOR table, the compaction 
> mode can *async.*
> In the async compaction mode, the hoodie-streamer will schedule one 
> compaction plan after each write operation and execute compaction plan if 
> need. But the execution of compaction will share the spark job resource, 
> which may cause the write delay.
> In our cases, we want to execute the compaction offline to save the spark 
> resource for streamer and reduce the write latency. And we found that 
> scheduling the compaction plan offline will fail while streamer is writing 
> (means we have to stop the streamer in order to schedule the plan offline). 
> So we only want the streamer to schedule the compaction but not to execute it.
> But currently the streamer seems not support such case. If we set the 
> `--disable-compaction` to false, the streamer will not schedule the 
> compaction anymore.
> So I want to add a param named --{_}enable-schedule-compaction{_} in the 
> streamer,
> and we can set --{_}disable-compaction{_}=false and 
> {_}enable-schedule-compaction{_}=true to enable only schedule the compaction 
> in streamer.
> the cases like below:
> ||param case||schedule plan||execute plan||
> |--disable-compaction = true
> no matter --enable-schedule-compaction|true|true|
> |--disable-compaction = false
> --enable-schedule-compaction = true|true|false|
> |--disable-compaction = false
> --enable-schedule-compaction = false|false|false|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to