[GitHub] [hudi] KnightChess commented on a diff in pull request #7304: [HUDI-5278]support more conf to cluster procedure

GitBox Sun, 27 Nov 2022 05:17:58 -0800


KnightChess commented on code in PR #7304:
URL: https://github.com/apache/hudi/pull/7304#discussion_r1032930294



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/procedure/TestClusteringProcedure.scala:
##########
@@ -385,4 +395,189 @@ class TestClusteringProcedure extends 
HoodieSparkProcedureTestBase {
       }
     }
   }
+
+  test("Test Call run_clustering Procedure op") {
+    withTempDir { tmp =>
+      val tableName = generateTableName
+      val basePath = s"${tmp.getCanonicalPath}/$tableName"
+
+      spark.sql(
+        s"""
+           |create table $tableName (
+           |  c1 int,
+           |  c2 string,
+           |  c3 double
+           |) using hudi
+           | options (
+           |  primaryKey = 'c1',
+           |  type = 'cow',
+           |  hoodie.metadata.enable = 'true',
+           |  hoodie.metadata.index.column.stats.enable = 'true',
+           |  hoodie.enable.data.skipping = 'true',
+           |  hoodie.datasource.write.operation = 'insert'
+           | )
+           | location '$basePath'
+     """.stripMargin)
+
+      writeRecords(2, 4, 0, basePath, Map("hoodie.avro.schema.validate"-> 
"false"))
+      val conf = new Configuration
+      val metaClient = 
HoodieTableMetaClient.builder.setConf(conf).setBasePath(basePath).build
+      metaClient.reloadActiveTimeline()
+      assert(0 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count())
+      
assert(metaClient.getActiveTimeline.filterPendingReplaceTimeline().empty())
+
+      spark.sql(s"call run_clustering(table => '$tableName', op => 
'schedule')")
+      metaClient.reloadActiveTimeline()
+      assert(0 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count())
+      assert(1 == 
metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.count())
+
+      spark.sql(s"call run_clustering(table => '$tableName', op => 'execute')")
+      metaClient.reloadActiveTimeline()
+      assert(1 == 
metaClient.getActiveTimeline.getCompletedReplaceTimeline.getInstants.count())
+      assert(0 == 
metaClient.getActiveTimeline.filterPendingReplaceTimeline().getInstants.count())
+
+      spark.sql(s"call run_clustering(table => '$tableName')")

Review Comment:
   scheduleandexecute is default, I will add invalid op case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] KnightChess commented on a diff in pull request #7304: [HUDI-5278]support more conf to cluster procedure

Reply via email to