Re: [PR] fix: Handle deletes and updates properly in secondary index [hudi]

via GitHub Sun, 19 Oct 2025 23:10:07 -0700


yihua commented on code in PR #14090:
URL: https://github.com/apache/hudi/pull/14090#discussion_r2443890909



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -1767,6 +1767,146 @@ class TestSecondaryIndexPruning extends 
SparkClientFunctionalTestHarness {
     )
   }
 
+  /**
+   * Test Secondary Index with partition path update using global record index.
+   * This test validates that when a record moves from one partition (file 
group) to another
+   * using global index, the secondary index is correctly updated and queries 
work as expected.
+   *
+   * Test flow:
+   * 1. Create a table with global index enabled
+   * 2. Insert records into different partitions with a secondary index
+   * 3. Update partition path of a record (moving it from partition A to B)
+   * 4. Validate secondary index metadata is correct (no duplicates, no 
missing entry)
+   * 5. Validate query results using secondary index pruning
+   */
+  @ParameterizedTest
+  @CsvSource(Array("COPY_ON_WRITE,true", "COPY_ON_WRITE,false", 
"MERGE_ON_READ,true", "MERGE_ON_READ,false"))
+  def testSecondaryIndexWithPartitionPathUpdateUsingGlobalIndex(tableType: 
HoodieTableType,

Review Comment:
   Partition path updates for `MERGE_ON_READ` table would add log files for 
deletes and inserts after global index, which also reads the file groups.  So 
it would be good to have test coverage on `MERGE_ON_READ` table type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: Handle deletes and updates properly in secondary index [hudi]

Reply via email to