Re: [PR] [HUDI-8519] Fix update with multiple secondary indexes [hudi]

via GitHub Thu, 14 Nov 2024 12:36:38 -0800


nsivabalan commented on code in PR #12259:
URL: https://github.com/apache/hudi/pull/12259#discussion_r1842856550



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -192,6 +192,17 @@ class TestSecondaryIndexPruning extends 
SparkClientFunctionalTestHarness {
         Seq(s"cde${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row2"),
         Seq(s"def${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row3")
       )
+      // update the secondary key column after creating multiple secondary 
indexes
+      spark.sql(s"update $tableName set not_record_key_col = 'xyz' where 
record_key_col = 'row1'")
+      // validate the secondary index records themselves
+      checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from 
hudi_metadata('$basePath') where type=7 and key like '%row1'")(

Review Comment:
   can we also validate entire contents of sec index and lets validate the 
partition path meta field in MDT partition as well (which will ensure diff sec 
index partitions are validated for its content



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1660,7 +1660,10 @@ public static String 
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
    * @param index         Index of the file group within the partition
    * @return The fileID
    */
-  public static String getFileIDForFileGroup(MetadataPartitionType 
partitionType, int index) {
+  public static String getFileIDForFileGroup(MetadataPartitionType 
partitionType, int index, String partitionName) {
+    if (MetadataPartitionType.FUNCTIONAL_INDEX.equals(partitionType) || 
MetadataPartitionType.SECONDARY_INDEX.equals(partitionType)) {
+      return String.format("%s%04d-%d", partitionName.replaceAll("_", 
"-").concat("-"), index, 0);

Review Comment:
   why do we need the replace for functional and sec index ? can you help me 
understand



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -192,6 +192,17 @@ class TestSecondaryIndexPruning extends 
SparkClientFunctionalTestHarness {
         Seq(s"cde${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row2"),
         Seq(s"def${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row3")
       )
+      // update the secondary key column after creating multiple secondary 
indexes
+      spark.sql(s"update $tableName set not_record_key_col = 'xyz' where 
record_key_col = 'row1'")
+      // validate the secondary index records themselves
+      checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from 
hudi_metadata('$basePath') where type=7 and key like '%row1'")(
+        Seq(s"xyz${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row1", false)
+      )
+      // validate data and data skipping
+      checkAnswer(s"select ts, record_key_col, not_record_key_col, 
partition_key_col from $tableName where record_key_col = 'row1'")(
+        Seq(1, "row1", "xyz", "p1")
+      )
+      verifyQueryPredicate(hudiOpts, "not_record_key_col", "abc")

Review Comment:
   can we also do query predicate on "ts" col (the other sec index) 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -907,7 +907,7 @@ private void initializeFileGroups(HoodieTableMetaClient 
dataMetaClient, Metadata
         fileGroupCount, partitionName, metadataPartition.getFileIdPrefix(), 
instantTime);
     LOG.info(msg);
     final List<String> fileGroupFileIds = IntStream.range(0, fileGroupCount)
-        .mapToObj(i -> 
HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i))
+        .mapToObj(i -> 
HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i, 
partitionName))

Review Comment:
   So, this impacts both sec index and functional index as well then ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8519] Fix update with multiple secondary indexes [hudi]

Reply via email to