nsivabalan commented on code in PR #18191:
URL: https://github.com/apache/hudi/pull/18191#discussion_r2830354187
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java:
##########
@@ -1033,6 +1024,21 @@ public void testBinaryCopyClustering() throws Exception {
false, SqlQueryEqualityPreCommitValidator.class.getName(),
COUNT_SQL_QUERY_FOR_VALIDATION, "");
}
+ /**
+ * Tests clustering when {@code hoodie.clustering.plan.partition.parallel}
is disabled (non-default).
+ * Ensures the code path that computes clustering groups on the driver
serially (HoodieLocalEngineContext) is exercised.
+ */
+ @ParameterizedTest
+ @MethodSource("populateMetaFieldsParams")
+ public void
testClusteringWithComputePlanPerPartitionParallelDisabled(boolean
populateMetaFields) throws Exception {
+ initMetaClient(getPropertiesForKeyGen(populateMetaFields));
+ HoodieClusteringConfig clusteringConfig = createClusteringBuilder(true, 1)
+ .planPartitionParallel(false)
+ .build();
+ testInsertAndClustering(clusteringConfig, populateMetaFields, true,
Review Comment:
we need to validate that local engine context is used and not spark engine
context. @suryaprasanna wrote a test in one of his patches recently to assert
this. may be you can take inspiration from his patch
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java:
##########
@@ -347,6 +347,18 @@ public class HoodieClusteringConfig extends HoodieConfig {
+ "Please exercise caution while setting this config, especially
when clustering is done very frequently. This could lead to race condition in "
+ "rare scenarios, for example, when the clustering completes after
instants are fetched but before rollback completed.");
+ public static final ConfigProperty<Boolean> PLAN_PARTITION_PARALLEL =
ConfigProperty
Review Comment:
lets try to find a better name for this.
how about
`hoodie.clustering.plan.generation.use.local.engine.context`
CC @yihua : can you think of a better naming here.
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnCopyOnWriteStorage.java:
##########
@@ -1033,6 +1024,21 @@ public void testBinaryCopyClustering() throws Exception {
false, SqlQueryEqualityPreCommitValidator.class.getName(),
COUNT_SQL_QUERY_FOR_VALIDATION, "");
}
+ /**
+ * Tests clustering when {@code hoodie.clustering.plan.partition.parallel}
is disabled (non-default).
+ * Ensures the code path that computes clustering groups on the driver
serially (HoodieLocalEngineContext) is exercised.
+ */
+ @ParameterizedTest
+ @MethodSource("populateMetaFieldsParams")
+ public void
testClusteringWithComputePlanPerPartitionParallelDisabled(boolean
populateMetaFields) throws Exception {
Review Comment:
we don't need the parametrized. we can just go w/ OOB (populate meta fields
true)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java:
##########
@@ -164,7 +165,14 @@ public Option<HoodieClusteringPlan>
generateClusteringPlan(ClusteringPlanActionE
return Option.empty();
}
- List<Pair<List<HoodieClusteringGroup>, String>> res =
getEngineContext().map(partitionPaths, partitionPath -> {
+ final HoodieEngineContext engineContext;
+ if (getWriteConfig().isClusteringPlanPartitionParallel()) {
+ engineContext = getEngineContext();
+ } else {
+ engineContext = new
HoodieLocalEngineContext(getEngineContext().getStorageConf());
Review Comment:
wondering if we can automatically do this for non partitioned tables as
well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]