Re: [PR] [HUDI-7069] Optimize metaclient construction and include table config… [hudi]

via GitHub Sun, 12 Nov 2023 18:09:54 -0800


majian1998 commented on code in PR #10048:
URL: https://github.com/apache/hudi/pull/10048#discussion_r1390560977



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/multitable/ClusteringTask.java:
##########
@@ -43,13 +44,18 @@ class ClusteringTask extends TableServiceTask {
    */
   private String clusteringMode;
 
+  /**
+   * Meta Client.
+   */
+  private HoodieTableMetaClient metaClient;
+
   @Override
   void run() {
     HoodieClusteringJob.Config clusteringConfig = new 
HoodieClusteringJob.Config();
     clusteringConfig.basePath = basePath;
     clusteringConfig.parallelism = parallelism;
     clusteringConfig.runningMode = clusteringMode;
-    new HoodieClusteringJob(jsc, clusteringConfig, props).cluster(retry);
+    new HoodieClusteringJob(jsc, clusteringConfig, props, 
metaClient).cluster(retry);

Review Comment:
   This‘s right. In the existing code, the metaclient is reloaded during 
clustering, and a new metaclient is created during compaction. A better 
implementation would be to have consistent behavior for compaction as well. The 
metaclient should be reloaded at the point of creation.
   ```
   private int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
   LOG.info("Step 1: Do schedule");
   metaClient = HoodieTableMetaClient.reload(metaClient);
   ...
   }
   ```
   
   ```
   �� private int doCompact(JavaSparkContext jsc) throws Exception {
    ...
            if (StringUtils.isNullOrEmpty(cfg.compactionInstantTime)) {
           HoodieTableMetaClient metaClient = UtilHelpers.createMetaClient(jsc, 
cfg.basePath, true);...
   }
   ```
   I will make modifications here.



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/multitable/ClusteringTask.java:
##########
@@ -43,13 +44,18 @@ class ClusteringTask extends TableServiceTask {
    */
   private String clusteringMode;
 
+  /**
+   * Meta Client.
+   */
+  private HoodieTableMetaClient metaClient;
+
   @Override
   void run() {
     HoodieClusteringJob.Config clusteringConfig = new 
HoodieClusteringJob.Config();
     clusteringConfig.basePath = basePath;
     clusteringConfig.parallelism = parallelism;
     clusteringConfig.runningMode = clusteringMode;
-    new HoodieClusteringJob(jsc, clusteringConfig, props).cluster(retry);
+    new HoodieClusteringJob(jsc, clusteringConfig, props, 
metaClient).cluster(retry);

Review Comment:
   This‘s right. In the existing code, the metaclient is reloaded during 
clustering, and a new metaclient is created during compaction. A better 
implementation would be to have consistent behavior for compaction as well. The 
metaclient should be reloaded at the point of creation.
   ```
   private int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
   LOG.info("Step 1: Do schedule");
   metaClient = HoodieTableMetaClient.reload(metaClient);
   ...
   }
   ```
   
   ```
   �� private int doCompact(JavaSparkContext jsc) throws Exception {
    ...
            if (StringUtils.isNullOrEmpty(cfg.compactionInstantTime)) {
           HoodieTableMetaClient metaClient = UtilHelpers.createMetaClient(jsc, 
cfg.basePath, true);...
   }
   ```
   I will make modifications here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-7069] Optimize metaclient construction and include table config… [hudi]

Reply via email to