Re: [PR] [HUDI-8141] Incremental Query with Completion Time [hudi]

via GitHub Sat, 12 Oct 2024 15:59:46 -0700


yihua commented on code in PR #11947:
URL: https://github.com/apache/hudi/pull/11947#discussion_r1797787989



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSourceStorage.scala:
##########
@@ -185,7 +188,7 @@ class TestCOWDataSourceStorage extends 
SparkClientFunctionalTestHarness {
     val hoodieIncViewDF1 = spark.read.format("org.apache.hudi")
       .option(DataSourceReadOptions.QUERY_TYPE.key, 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
       .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, "000")
-      .option(DataSourceReadOptions.END_INSTANTTIME.key, firstCommit)
+      .option(DataSourceReadOptions.END_INSTANTTIME.key, completionTime1)

Review Comment:
   Should this be from `completionTime1` to `completionTime2`?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalReadWithFullTableScan.scala:
##########
@@ -70,9 +70,11 @@ class TestIncrementalReadWithFullTableScan extends 
HoodieSparkClientTestBase {
       HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
       HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key -> "1"
     )
+
+

Review Comment:
   Remove redundant empty line.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestIncrementalReadWithFullTableScan.scala:
##########
@@ -70,9 +70,11 @@ class TestIncrementalReadWithFullTableScan extends 
HoodieSparkClientTestBase {
       HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
       HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key -> "1"
     )
+
+
     // Create 10 commits
     for (i <- 1 to 10) {
-      val records = recordsToStrings(dataGen.generateInserts("%05d".format(i), 
perBatchSize)).asScala.toList
+      val records = 
recordsToStrings(dataGen.generateInserts(System.currentTimeMillis(), 
perBatchSize)).asScala.toList

Review Comment:
   Have you added any new test cases that cover the scenarios where the 
completion time ordering is different than instant time ordering and make sure 
the incremental queries return the correct results, for both incremental and 
fallback full scan?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSourceStorage.scala:
##########
@@ -232,7 +235,7 @@ class TestCOWDataSourceStorage extends 
SparkClientFunctionalTestHarness {
     val timeTravelDF = spark.read.format("org.apache.hudi")
       .option(DataSourceReadOptions.QUERY_TYPE.key, 
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
       .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, "000")
-      .option(DataSourceReadOptions.END_INSTANTTIME.key, firstCommit)
+      .option(DataSourceReadOptions.END_INSTANTTIME.key, completionTime1)
       .load(basePath)

Review Comment:
   Should this be from `completionTime1` to `completionTime2`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8141] Incremental Query with Completion Time [hudi]

Reply via email to