zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r670111402



##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##########
@@ -171,4 +200,38 @@ private int doCluster(JavaSparkContext jsc) throws 
Exception {
       return client.scheduleClustering(Option.empty());
     }
   }
+
+  @TestOnly
+  public int doScheduleAndCluster() throws Exception {
+    return this.doScheduleAndCluster(jsc);
+  }
+
+  public int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
+    LOG.info("Step 1: Do schedule");
+    String schemaStr = getSchemaFromLatestInstant();
+    try (SparkRDDWriteClient client = UtilHelpers.createHoodieClient(jsc, 
cfg.basePath, schemaStr, cfg.parallelism, Option.empty(), props)) {
+
+      Option<String> instantTime;
+      if (cfg.clusteringInstantTime != null) {
+        client.scheduleClusteringAtInstant(cfg.clusteringInstantTime, 
Option.empty());
+        instantTime = Option.of(cfg.clusteringInstantTime);
+      } else {
+        instantTime = client.scheduleClustering(Option.empty());
+      }
+
+      int result = instantTime.isPresent() ? 0 : -1;

Review comment:
       Emmmm, actually, there already has doSchedule() and doCluster() 
function. But if we let doScheduleAndCluster() use  doschedule() and 
docluster() directly, it will start and stop SparkRDDWriteClient twice which is 
an expensive action and unnecessary. 
   
   Maybe let schedule action and cluster action use a common 
SparkRDDWriteClient is better.
   
   For example start and stop Timeline service twice.
   ```
   21/07/15 11:05:11 INFO EmbeddedTimelineService: Starting Timeline service !!
   21/07/15 11:05:11 INFO EmbeddedTimelineService: Overriding hostIp to 
(localhost) found in spark-conf. It was null
   21/07/15 11:05:11 INFO FileSystemViewManager: Creating View Manager with 
storage type :MEMORY
   21/07/15 11:05:11 INFO FileSystemViewManager: Creating in-memory based Table 
View
   21/07/15 11:05:11 INFO log: Logging initialized @4500ms to 
org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
   21/07/15 11:05:11 INFO Javalin: 
              __                      __ _
             / /____ _ _   __ ____ _ / /(_)____
        __  / // __ `/| | / // __ `// // // __ \
       / /_/ // /_/ / | |/ // /_/ // // // / / /
       \____/ \__,_/  |___/ \__,_//_//_//_/ /_/
   
           https://javalin.io/documentation
   
   21/07/15 11:05:11 INFO Javalin: Starting Javalin ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to