majian1998 commented on code in PR #10048:
URL: https://github.com/apache/hudi/pull/10048#discussion_r1390560977
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/multitable/ClusteringTask.java:
##########
@@ -43,13 +44,18 @@ class ClusteringTask extends TableServiceTask {
*/
private String clusteringMode;
+ /**
+ * Meta Client.
+ */
+ private HoodieTableMetaClient metaClient;
+
@Override
void run() {
HoodieClusteringJob.Config clusteringConfig = new
HoodieClusteringJob.Config();
clusteringConfig.basePath = basePath;
clusteringConfig.parallelism = parallelism;
clusteringConfig.runningMode = clusteringMode;
- new HoodieClusteringJob(jsc, clusteringConfig, props).cluster(retry);
+ new HoodieClusteringJob(jsc, clusteringConfig, props,
metaClient).cluster(retry);
Review Comment:
This‘s right. In the existing code, the metaclient is reloaded during
clustering, and a new metaclient is created during compaction. A better
implementation would be to have consistent behavior for compaction as well. The
metaclient should be reloaded at the point of creation.
```
private int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
LOG.info("Step 1: Do schedule");
metaClient = HoodieTableMetaClient.reload(metaClient);
...
}
```
```
�� private int doCompact(JavaSparkContext jsc) throws Exception {
...
if (StringUtils.isNullOrEmpty(cfg.compactionInstantTime)) {
HoodieTableMetaClient metaClient = UtilHelpers.createMetaClient(jsc,
cfg.basePath, true);...
}
```
I will make modifications here.
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/multitable/ClusteringTask.java:
##########
@@ -43,13 +44,18 @@ class ClusteringTask extends TableServiceTask {
*/
private String clusteringMode;
+ /**
+ * Meta Client.
+ */
+ private HoodieTableMetaClient metaClient;
+
@Override
void run() {
HoodieClusteringJob.Config clusteringConfig = new
HoodieClusteringJob.Config();
clusteringConfig.basePath = basePath;
clusteringConfig.parallelism = parallelism;
clusteringConfig.runningMode = clusteringMode;
- new HoodieClusteringJob(jsc, clusteringConfig, props).cluster(retry);
+ new HoodieClusteringJob(jsc, clusteringConfig, props,
metaClient).cluster(retry);
Review Comment:
This‘s right. In the existing code, the metaclient is reloaded during
clustering, and a new metaclient is created during compaction. A better
implementation would be to have consistent behavior for compaction as well. The
metaclient should be reloaded at the point of creation.
```
private int doScheduleAndCluster(JavaSparkContext jsc) throws Exception {
LOG.info("Step 1: Do schedule");
metaClient = HoodieTableMetaClient.reload(metaClient);
...
}
```
```
�� private int doCompact(JavaSparkContext jsc) throws Exception {
...
if (StringUtils.isNullOrEmpty(cfg.compactionInstantTime)) {
HoodieTableMetaClient metaClient = UtilHelpers.createMetaClient(jsc,
cfg.basePath, true);...
}
```
I will make modifications here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]