vinishjail97 commented on code in PR #17694:
URL: https://github.com/apache/hudi/pull/17694#discussion_r2710839581
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestColumnStatsIndex.scala:
##########
@@ -259,6 +259,221 @@ class TestColumnStatsIndex extends
ColumnStatIndexTestBase {
addNestedFiled = true)
}
+ /**
+ * Tests data skipping with nested MAP and ARRAY fields in column stats
index.
+ * This test verifies that queries can efficiently skip files based on
nested field values
+ * within MAP and ARRAY types using the new Parquet-style accessor patterns.
+ */
+ @ParameterizedTest
Review Comment:
Added tests.
##########
hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaUtils.java:
##########
@@ -59,6 +59,41 @@ public final class HoodieSchemaUtils {
public static final HoodieSchema METADATA_FIELD_SCHEMA =
HoodieSchema.createNullable(HoodieSchemaType.STRING);
public static final HoodieSchema RECORD_KEY_SCHEMA = initRecordKeySchema();
+ /**
+ * Constants for Parquet-style accessor patterns used in nested MAP and
ARRAY navigation.
+ * These patterns are specifically used for column stats generation and
differ from
+ * InternalSchema constants which are used in schema evolution contexts.
+ */
+ private static final String ARRAY_LIST = "list";
+ private static final String ARRAY_ELEMENT = "element";
+ private static final String ARRAY_SPARK = "array"; // Spark writer uses this
Review Comment:
Addressed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]