codope commented on a change in pull request #4454:
URL: https://github.com/apache/hudi/pull/4454#discussion_r775820710
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/ClusteringPlanStrategy.java
##########
@@ -56,6 +59,37 @@
private final transient HoodieEngineContext engineContext;
private final HoodieWriteConfig writeConfig;
+ /**
+ * Check if the given class is deprecated.
+ * If it is, then try to convert it to suitable one and update the write
config accordingly.
+ * @param config write config
+ * @return class name of clustering plan strategy
+ */
+ public static String checkAndGetClusteringPlanStrategy(HoodieWriteConfig
config) {
+ String className = config.getClusteringPlanStrategyClass();
+ String sparkSizeBasedClassName =
"org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy";
+ String sparkSelectedPartitionsClassName =
"org.apache.hudi.client.clustering.plan.strategy.SparkSelectedPartitionsClusteringPlanStrategy";
+ String sparkRecentDaysClassName =
"org.apache.hudi.client.clustering.plan.strategy.SparkRecentDaysClusteringPlanStrategy";
+ String javaSelectedPartionClassName =
"org.apache.hudi.client.clustering.plan.strategy.JavaRecentDaysClusteringPlanStrategy";
Review comment:
Rename variable to `javaRecentDaysClassName`? Or even better if we can
reuse the class constants in `HoodieClusteringConfig`.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ClusteringPlanPartitionFilter.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.action.cluster;
+
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieClusteringException;
+
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Partition filter utilities. Currently, we support three mode:
+ * NONE: skip filter
+ * RECENT DAYS: output recent partition given skip num and days lookback
config
+ * SELECTED_PARTITIONS: output partition falls in the [start, end] condition
+ */
+public class ClusteringPlanPartitionFilter {
+
+ public static List<String> filter(List<String> partitions, HoodieWriteConfig
config) {
+ ClusteringPlanPartitionFilterMode mode =
config.getClusteringPlanPartitionFilterMode();
+ switch (mode) {
+ case NONE:
+ return partitions;
+ case RECENT_DAYS:
+ return recentDaysFilter(partitions, config);
+ case SELECTED_PARTITIONS:
+ return selectedPartitionsFilter(partitions, config);
+ default:
+ throw new HoodieClusteringException("Unknown partition filter, filter
mode: " + mode);
+ }
+ }
+
+ public static List<String> recentDaysFilter(List<String> partitions,
HoodieWriteConfig config) {
Review comment:
Does this method and the one below need to be public?
##########
File path:
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/cluster/strategy/TestClusteringPlanStrategy.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.cluster.strategy;
+
+import org.apache.hudi.config.HoodieClusteringConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.action.cluster.ClusteringPlanPartitionFilterMode;
+import org.apache.hudi.table.action.cluster.strategy.ClusteringPlanStrategy;
+
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.util.stream.Stream;
+
+/**
+ *
+ */
+public class TestClusteringPlanStrategy {
Review comment:
Rename to `TestClusteringPlanStrategyConfigCompatibility`?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java
##########
@@ -56,7 +58,9 @@ public PartitionAwareClusteringPlanStrategy(HoodieTable
table, HoodieEngineConte
* Return list of partition paths to be considered for clustering.
*/
protected List<String> filterPartitionPaths(List<String> partitionPaths) {
- return partitionPaths;
+ List<String> filteredPartitions =
ClusteringPlanPartitionFilter.filter(partitionPaths, getWriteConfig());
+ LOG.info("Filtered to the following partitions: " + filteredPartitions);
Review comment:
We already get the partition info in replacecommit metadata. Is this log
necessary? Let's make it a debug level log if necessary.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/ClusteringPlanStrategy.java
##########
@@ -56,6 +59,37 @@
private final transient HoodieEngineContext engineContext;
private final HoodieWriteConfig writeConfig;
+ /**
+ * Check if the given class is deprecated.
+ * If it is, then try to convert it to suitable one and update the write
config accordingly.
+ * @param config write config
+ * @return class name of clustering plan strategy
+ */
+ public static String checkAndGetClusteringPlanStrategy(HoodieWriteConfig
config) {
+ String className = config.getClusteringPlanStrategyClass();
+ String sparkSizeBasedClassName =
"org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy";
Review comment:
Let's reuse the constants in `HoodieClusteringConfig`
##########
File path:
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/cluster/strategy/TestClusteringPlanStrategy.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table.cluster.strategy;
+
+import org.apache.hudi.config.HoodieClusteringConfig;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.table.action.cluster.ClusteringPlanPartitionFilterMode;
+import org.apache.hudi.table.action.cluster.strategy.ClusteringPlanStrategy;
+
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.util.stream.Stream;
+
+/**
+ *
+ */
Review comment:
Let's remove this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]