[
https://issues.apache.org/jira/browse/HUDI-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375628#comment-17375628
]
ASF GitHub Bot commented on HUDI-2135:
--------------------------------------
garyli1019 commented on a change in pull request #3226:
URL: https://github.com/apache/hudi/pull/3226#discussion_r664203627
##########
File path:
hudi-flink/src/main/java/org/apache/hudi/sink/compact/FlinkCompactionConfig.java
##########
@@ -89,6 +89,9 @@
@Parameter(names = {"--compaction-tasks"}, description = "Parallelism of
tasks that do actual compaction, default is -1", required = false)
public Integer compactionTasks = -1;
+ @Parameter(names = {"--schedule", "-sc"}, description = "Schedule
compaction", required = false)
Review comment:
Should we add some comments that async scheduling is not recommended and
the possible risk? Most users won't dig this deep
##########
File path:
hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
##########
@@ -75,15 +75,28 @@ public static void main(String[] args) throws Exception {
// judge whether have operation
// to compute the compaction instant time and do compaction.
- String compactionInstantTime =
CompactionUtil.getCompactionInstantTime(metaClient);
- boolean scheduled =
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
- if (!scheduled) {
+ if (cfg.schedule) {
+ String compactionInstantTime =
CompactionUtil.getCompactionInstantTime(metaClient);
+ boolean scheduled =
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
+ if (!scheduled) {
+ // do nothing.
+ LOG.info("No compaction plan for this job ");
+ return;
+ }
+ }
+
+ table.getMetaClient().reloadActiveTimeline();
+
+ // the last instant takes the highest priority
+ Option<HoodieInstant> lastRequested =
table.getActiveTimeline().filterPendingCompactionTimeline()
Review comment:
should we also make this configurable? First in first out or last in
first out.
##########
File path:
hudi-flink/src/main/java/org/apache/hudi/sink/compact/HoodieFlinkCompactor.java
##########
@@ -75,15 +75,29 @@ public static void main(String[] args) throws Exception {
// judge whether have operation
// to compute the compaction instant time and do compaction.
- String compactionInstantTime =
CompactionUtil.getCompactionInstantTime(metaClient);
- boolean scheduled =
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
- if (!scheduled) {
+ if (cfg.schedule) {
+ String compactionInstantTime =
CompactionUtil.getCompactionInstantTime(metaClient);
+ boolean scheduled =
writeClient.scheduleCompactionAtInstant(compactionInstantTime, Option.empty());
+ if (!scheduled) {
+ // do nothing.
+ LOG.info("No compaction plan for this job ");
+ return;
+ }
+ }
+
+ table.getMetaClient().reloadActiveTimeline();
+
+ // the last instant takes the highest priority
Review comment:
outdated comment
##########
File path:
hudi-flink/src/main/java/org/apache/hudi/sink/compact/FlinkCompactionConfig.java
##########
@@ -89,12 +89,23 @@
@Parameter(names = {"--compaction-tasks"}, description = "Parallelism of
tasks that do actual compaction, default is -1", required = false)
public Integer compactionTasks = -1;
+ @Parameter(names = {"--schedule", "-sc"}, description = "Schedule the
compaction plan, there is risk to lost data when this option turns on,\n"
Review comment:
Not recommended. Schedule the compaction plan in this job. There is a
risk of losing data when scheduling compaction outside the writer job.
Scheduling compaction in the writer job and only let this job do the compaction
execution is recommended. Default is false.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Add compaction schedule option for flink
> ----------------------------------------
>
> Key: HUDI-2135
> URL: https://issues.apache.org/jira/browse/HUDI-2135
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Flink Integration
> Reporter: Danny Chen
> Assignee: Danny Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.9.0
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)