[
https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335061#comment-17335061
]
Nishith Agarwal commented on HUDI-1847:
---------------------------------------
Steps to contribute this PR
# Start by adding a config to SCHEDULE compaction inline or not, so that
allows to turn off inline compaction but schedule inline or not. This can be
added here ->
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java]
# Next, this config needs to be added to
[https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java]
so it's part of the getters
# This config then needs to be honored in all the places compaction is
scheduled, good place to look at are : AbstractHoodieWriteClient,
DeltaSync/HoodieDeltaStreamer and HoodieSparkSqlWriter.scala
# Once this config is honored, you should be able to write test cases for each
of these parts of the code to be able to test out this feature
> Add ability to decouple configs for scheduling inline and running async
> -----------------------------------------------------------------------
>
> Key: HUDI-1847
> URL: https://issues.apache.org/jira/browse/HUDI-1847
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Compaction
> Reporter: Nishith Agarwal
> Priority: Major
> Labels: sev:high
>
> Currently, there are 2 ways to enable compaction:
>
> # Inline - This will schedule compaction inline and execute inline
> # Async - This option is only available for HoodieDeltaStreamer based jobs.
> This turns on scheduling inline and running async as part of the same spark
> job.
>
> Users need a config to be able to schedule only inline while having an
> ability to execute in their own spark job
--
This message was sent by Atlassian Jira
(v8.3.4#803005)