nsivabalan commented on code in PR #12259:
URL: https://github.com/apache/hudi/pull/12259#discussion_r1842856550
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -192,6 +192,17 @@ class TestSecondaryIndexPruning extends
SparkClientFunctionalTestHarness {
Seq(s"cde${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row2"),
Seq(s"def${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row3")
)
+ // update the secondary key column after creating multiple secondary
indexes
+ spark.sql(s"update $tableName set not_record_key_col = 'xyz' where
record_key_col = 'row1'")
+ // validate the secondary index records themselves
+ checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from
hudi_metadata('$basePath') where type=7 and key like '%row1'")(
Review Comment:
can we also validate entire contents of sec index and lets validate the
partition path meta field in MDT partition as well (which will ensure diff sec
index partitions are validated for its content
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1660,7 +1660,10 @@ public static String
deleteMetadataTablePartition(HoodieTableMetaClient dataMeta
* @param index Index of the file group within the partition
* @return The fileID
*/
- public static String getFileIDForFileGroup(MetadataPartitionType
partitionType, int index) {
+ public static String getFileIDForFileGroup(MetadataPartitionType
partitionType, int index, String partitionName) {
+ if (MetadataPartitionType.FUNCTIONAL_INDEX.equals(partitionType) ||
MetadataPartitionType.SECONDARY_INDEX.equals(partitionType)) {
+ return String.format("%s%04d-%d", partitionName.replaceAll("_",
"-").concat("-"), index, 0);
Review Comment:
why do we need the replace for functional and sec index ? can you help me
understand
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -192,6 +192,17 @@ class TestSecondaryIndexPruning extends
SparkClientFunctionalTestHarness {
Seq(s"cde${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row2"),
Seq(s"def${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row3")
)
+ // update the secondary key column after creating multiple secondary
indexes
+ spark.sql(s"update $tableName set not_record_key_col = 'xyz' where
record_key_col = 'row1'")
+ // validate the secondary index records themselves
+ checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from
hudi_metadata('$basePath') where type=7 and key like '%row1'")(
+ Seq(s"xyz${SECONDARY_INDEX_RECORD_KEY_SEPARATOR}row1", false)
+ )
+ // validate data and data skipping
+ checkAnswer(s"select ts, record_key_col, not_record_key_col,
partition_key_col from $tableName where record_key_col = 'row1'")(
+ Seq(1, "row1", "xyz", "p1")
+ )
+ verifyQueryPredicate(hudiOpts, "not_record_key_col", "abc")
Review Comment:
can we also do query predicate on "ts" col (the other sec index)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -907,7 +907,7 @@ private void initializeFileGroups(HoodieTableMetaClient
dataMetaClient, Metadata
fileGroupCount, partitionName, metadataPartition.getFileIdPrefix(),
instantTime);
LOG.info(msg);
final List<String> fileGroupFileIds = IntStream.range(0, fileGroupCount)
- .mapToObj(i ->
HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i))
+ .mapToObj(i ->
HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i,
partitionName))
Review Comment:
So, this impacts both sec index and functional index as well then ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]