Re: [PR] [HUDI-8519] Fix update with multiple secondary indexes [hudi]

via GitHub Fri, 15 Nov 2024 06:21:45 -0800


lokeshj1703 commented on code in PR #12259:
URL: https://github.com/apache/hudi/pull/12259#discussion_r1843904230



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -907,7 +907,7 @@ private void initializeFileGroups(HoodieTableMetaClient 
dataMetaClient, Metadata
         fileGroupCount, partitionName, metadataPartition.getFileIdPrefix(), 
instantTime);
     LOG.info(msg);
     final List<String> fileGroupFileIds = IntStream.range(0, fileGroupCount)
-        .mapToObj(i -> 
HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i))
+        .mapToObj(i -> 
HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i, 
partitionName))

Review Comment:
   Yes, it impacts both indexes.



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1660,7 +1660,10 @@ public static String 
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
    * @param index         Index of the file group within the partition
    * @return The fileID
    */
-  public static String getFileIDForFileGroup(MetadataPartitionType 
partitionType, int index) {
+  public static String getFileIDForFileGroup(MetadataPartitionType 
partitionType, int index, String partitionName) {
+    if (MetadataPartitionType.FUNCTIONAL_INDEX.equals(partitionType) || 
MetadataPartitionType.SECONDARY_INDEX.equals(partitionType)) {
+      return String.format("%s%04d-%d", partitionName.replaceAll("_", 
"-").concat("-"), index, 0);

Review Comment:
   Issue impacts both functional and secondary index so fix is required for both



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -192,6 +192,17 @@ class TestSecondaryIndexPruning extends 
SparkClientFunctionalTestHarness {
         Seq(s"cde${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row2"),
         Seq(s"def${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row3")
       )
+      // update the secondary key column after creating multiple secondary 
indexes
+      spark.sql(s"update $tableName set not_record_key_col = 'xyz' where 
record_key_col = 'row1'")
+      // validate the secondary index records themselves
+      checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from 
hudi_metadata('$basePath') where type=7 and key like '%row1'")(

Review Comment:
   Addressed



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -192,6 +192,17 @@ class TestSecondaryIndexPruning extends 
SparkClientFunctionalTestHarness {
         Seq(s"cde${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row2"),
         Seq(s"def${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row3")
       )
+      // update the secondary key column after creating multiple secondary 
indexes
+      spark.sql(s"update $tableName set not_record_key_col = 'xyz' where 
record_key_col = 'row1'")
+      // validate the secondary index records themselves
+      checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from 
hudi_metadata('$basePath') where type=7 and key like '%row1'")(
+        Seq(s"xyz${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row1", false)
+      )
+      // validate data and data skipping
+      checkAnswer(s"select ts, record_key_col, not_record_key_col, 
partition_key_col from $tableName where record_key_col = 'row1'")(
+        Seq(1, "row1", "xyz", "p1")
+      )
+      verifyQueryPredicate(hudiOpts, "not_record_key_col", "abc")

Review Comment:
   Addressed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8519] Fix update with multiple secondary indexes [hudi]

Reply via email to