[GitHub] [hudi] stream2000 commented on a diff in pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling

GitBox Sun, 11 Dec 2022 19:39:47 -0800


stream2000 commented on code in PR #7366:
URL: https://github.com/apache/hudi/pull/7366#discussion_r1045379908



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##########
@@ -602,6 +603,46 @@ class TestClusteringProcedure extends 
HoodieSparkProcedureTestBase {
     }
   }
 
+  test("Test Call run_clustering with partition selected config") {
+    withTempDir { tmp =>
+      val tableName = generateTableName
+      val basePath = s"${tmp.getCanonicalPath}/$tableName"
+      spark.sql(
+        s"""
+           |create table $tableName (
+           |  id int,
+           |  name string,
+           |  price double,
+           |  ts long
+           |) using hudi
+           | options (
+           |  primaryKey ='id',
+           |  type = 'cow',
+           |  preCombineField = 'ts'
+           | )
+           | partitioned by(ts)
+           | location '$basePath'
+     """.stripMargin)
+
+      spark.sql(s"insert into $tableName values(1, 'a1', 10, 1010)")
+      spark.sql(s"insert into $tableName values(2, 'a2', 10, 1010)")
+      spark.sql(s"insert into $tableName values(3, 'a3', 10, 1011)")
+      spark.sql(s"set 
${HoodieClusteringConfig.PARTITION_SELECTED.key()}=ts=1010")

Review Comment:
   Good suggestion. I will add more selected partitions as another test case 
below. This case is to test that when PARTITION_SELECTED is set, we don't need 
to list all partitions and use the partitions designated in config so I choose 
only a part of all partitions to schedule clustering. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] stream2000 commented on a diff in pull request #7366: [HUDI-5318] Fix partition pruning for clustering scheduling

Reply via email to