[jira] [Commented] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async

Nishith Agarwal (Jira) Wed, 28 Apr 2021 16:24:04 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335061#comment-17335061
 ]


Nishith Agarwal commented on HUDI-1847:
---------------------------------------

Steps to contribute this PR 

 
 # Start by adding a config to SCHEDULE compaction inline or not, so that 
allows to turn off inline compaction but schedule inline or not. This can be 
added here -> 
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java]
 # Next, this config needs to be added to 
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java]
 so it's part of the getters
 # This config then needs to be honored in all the places compaction is 
scheduled, good place to look at are : AbstractHoodieWriteClient, 
DeltaSync/HoodieDeltaStreamer and HoodieSparkSqlWriter.scala
 # Once this config is honored, you should be able to write test cases for each 
of these parts of the code to be able to test out this feature

> Add ability to decouple configs for scheduling inline and running async
> -----------------------------------------------------------------------
>
>                 Key: HUDI-1847
>                 URL: https://issues.apache.org/jira/browse/HUDI-1847
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Nishith Agarwal
>            Priority: Major
>              Labels: sev:high
>
> Currently, there are 2 ways to enable compaction:
>  
>  # Inline - This will schedule compaction inline and execute inline
>  # Async - This option is only available for HoodieDeltaStreamer based jobs. 
> This turns on scheduling inline and running async as part of the same spark 
> job.
>  
> Users need a config to be able to schedule only inline while having an 
> ability to execute in their own spark job



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async

Reply via email to