voonhous commented on code in PR #6566:
URL: https://github.com/apache/hudi/pull/6566#discussion_r962104627
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/FlinkClusteringConfig.java:
##########
@@ -69,13 +83,14 @@ public class FlinkClusteringConfig extends Configuration {
required = false)
public Integer archiveMaxCommits = 30;
- @Parameter(names = {"--schedule", "-sc"}, description = "Not recommended.
Schedule the clustering plan in this job.\n"
- + "There is a risk of losing data when scheduling clustering outside the
writer job.\n"
- + "Scheduling clustering in the writer job and only let this job do the
clustering execution is recommended.\n"
- + "Default is true", required = false)
- public Boolean schedule = true;
+ @Parameter(names = {"--schedule", "-sc"}, description = "Schedule the
clustering plan in this job.\n"
+ + "Default is false", required = false)
+ public Boolean schedule = false;
+
+ @Parameter(names = {"--instant-time", "-it"}, description = "Clustering
Instant time")
+ public String clusteringInstantTime = null;
Review Comment:
From `HoodieClusteringJob.java`
```
@Parameter(names = {"--instant-time", "-it"}, description = "Clustering
Instant time, only used when set --mode execute. "
+ "If the instant time is not provided with --mode execute, "
+ "the earliest scheduled clustering instant time is used by
default. "
+ "When set \"--mode scheduleAndExecute\" this instant-time will be
ignored.")
public String clusteringInstantTime = null;
```
Should we standardise the parameter? Given that the Spark parameter is using
`--instant-time`, we should ensure that both of them are the same to avoid
confusion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]