nsivabalan commented on code in PR #14090:
URL: https://github.com/apache/hudi/pull/14090#discussion_r2437255948


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndexPruning.scala:
##########
@@ -1767,6 +1767,146 @@ class TestSecondaryIndexPruning extends 
SparkClientFunctionalTestHarness {
     )
   }
 
+  /**
+   * Test Secondary Index with partition path update using global record index.
+   * This test validates that when a record moves from one partition (file 
group) to another
+   * using global index, the secondary index is correctly updated and queries 
work as expected.
+   *
+   * Test flow:
+   * 1. Create a table with global index enabled
+   * 2. Insert records into different partitions with a secondary index
+   * 3. Update partition path of a record (moving it from partition A to B)
+   * 4. Validate secondary index metadata is correct (no duplicates, no 
missing entry)
+   * 5. Validate query results using secondary index pruning
+   */
+  @ParameterizedTest
+  @CsvSource(Array("COPY_ON_WRITE,true", "COPY_ON_WRITE,false", 
"MERGE_ON_READ,true", "MERGE_ON_READ,false"))
+  def testSecondaryIndexWithPartitionPathUpdateUsingGlobalIndex(tableType: 
HoodieTableType,

Review Comment:
   do we need both table types?
   can we just keep it to COW table. 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/MetadataIndexMapper.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.common.data.HoodieData;
+import org.apache.hudi.common.function.SerializableFunction;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.config.HoodieWriteConfig;
+
+import java.io.Serializable;
+import java.util.Iterator;
+import java.util.List;
+
+/**
+ * Abstract base class for metadata index mappers. Each index type can extend 
this to implement
+ * its own record generation logic.
+ */
+public abstract class MetadataIndexMapper implements 
SerializableFunction<WriteStatus, Iterator<HoodieRecord>>, Serializable {
+  protected final HoodieWriteConfig dataWriteConfig;
+
+  public MetadataIndexMapper(HoodieWriteConfig dataWriteConfig) {
+    this.dataWriteConfig = dataWriteConfig;
+  }
+
+  /**
+   * Generates metadata index records from a WriteStatus.
+   *
+   * @param writeStatus the write status to process
+   * @return list of metadata records
+   */
+  protected abstract List<HoodieRecord> generateRecords(WriteStatus 
writeStatus);

Review Comment:
   can we file a follow up ticket to fix this to be an iterator. 
   not sure how much benefit we might get. 
   but to generate SI records for one file group, we have to read prev version 
of file slice, and new version of file slice and then compare to generate the 
SI records. not sure if we can do much by converting this to an iterator. 
   but we can file a follow up to attend do. 
   
   but don't expect it to give us any material gains. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to