zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670111402
##########
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##########
@@ -171,4 +200,38 @@ private int doCluster(JavaSparkContext jsc) throws
Exception {
return client.scheduleClustering(Option.empty());
}
}
+
+ @TestOnly
+ public int doScheduleAndCluster() throws Exception {
+ return this.doScheduleAndCluster(jsc);
+ }
+
+ public int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
+ LOG.info("Step 1: Do schedule");
+ String schemaStr = getSchemaFromLatestInstant();
+ try (SparkRDDWriteClient client = UtilHelpers.createHoodieClient(jsc,
cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+
+ Option<String> instantTime;
+ if (cfg.clusteringInstantTime != null) {
+ client.scheduleClusteringAtInstant(cfg.clusteringInstantTime,
Option.empty());
+ instantTime = Option.of(cfg.clusteringInstantTime);
+ } else {
+ instantTime = client.scheduleClustering(Option.empty());
+ }
+
+ int result = instantTime.isPresent() ? 0 : -1;
Review comment:
Emmmm, actually, there already has doSchedule() and doCluster()
function. But if we let doScheduleAndCluster() use doschedule() and
docluster() directly, it will start and stop SparkRDDWriteClient twice which is
an expensive action and unnecessary.
Maybe let schedule action and cluster action use a common
SparkRDDWriteClient is better.
For example start and stop Timeline service twice.
```
21/07/15 11:05:11 INFO EmbeddedTimelineService: Starting Timeline service !!
21/07/15 11:05:11 INFO EmbeddedTimelineService: Overriding hostIp to
(localhost) found in spark-conf. It was null
21/07/15 11:05:11 INFO FileSystemViewManager: Creating View Manager with
storage type :MEMORY
21/07/15 11:05:11 INFO FileSystemViewManager: Creating in-memory based Table
View
21/07/15 11:05:11 INFO log: Logging initialized @4500ms to
org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
21/07/15 11:05:11 INFO Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
21/07/15 11:05:11 INFO Javalin: Starting Javalin ...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]