This is an automated email from the ASF dual-hosted git repository.

voonhous pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
     new fb08a156b6f6 perf(metadata): Resolve column-stats field schemas once 
per collection instead of per record (#19000)
fb08a156b6f6 is described below

commit fb08a156b6f62324864759669d1c88c0e3faf282
Author: voonhous <[email protected]>
AuthorDate: Wed Jun 17 02:37:44 2026 +0800

    perf(metadata): Resolve column-stats field schemas once per collection 
instead of per record (#19000)
    
    * perf(metadata): Resolve column-stats field schemas once per collection 
instead of per record
    
    collectColumnRangeMetadata iterated the target columns inside the per-record
    loop and, for every record and every target field, recomputed values that
    depend only on the fixed target field list:
    - field.schema().getNonNullType() rebuilds the union-member wrappers for
      nullable fields (a fresh HoodieSchema per call), and
    - because that non-null schema was a fresh instance per record, its
      toAvroSchema() (used for min/max compares) never memoized.
    
    Resolve the non-null HoodieSchema once per target field before the loop and
    iterate that precomputed list; holding a stable instance also lets
    toAvroSchema() memoize across records. Per-record work (type-support check,
    value extraction, min/max/null/value counts) is unchanged, so results are
    identical. Covered by the existing column-stats functional tests.
    
    Closes #18999
    
    * review(metadata): stream per-field schema resolution and drop 
toAvroSchema memoize note
---
 .../org/apache/hudi/metadata/HoodieTableMetadataUtil.java    | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
 
b/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
index 4823f041bae6..cfdeb45b84ed 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
@@ -276,15 +276,19 @@ public class HoodieTableMetadataUtil {
     final Properties properties = new Properties();
     properties.setProperty(HoodieStorageConfig.WRITE_UTC_TIMEZONE.key(),
         storageConfig.getString(HoodieStorageConfig.WRITE_UTC_TIMEZONE.key(), 
HoodieStorageConfig.WRITE_UTC_TIMEZONE.defaultValue().toString()));
+    // getNonNullType() rebuilds the union-member wrappers for nullable fields 
and depends only on the
+    // (fixed) target fields, so resolve it once per field instead of once per 
record per field.
+    List<Pair<String, HoodieSchema>> nonNullFieldSchemas = 
targetFields.stream()
+        .map(p -> Pair.of(p.getKey(), p.getValue().schema().getNonNullType()))
+        .collect(Collectors.toList());
     // Collect stats for all columns by iterating through records while 
accounting
     // corresponding stats
     records.forEachRemaining((record) -> {
       // For each column (field) we have to index update corresponding column 
stats
       // with the values from this record
-      targetFields.forEach(fieldNameFieldPair -> {
-        String fieldName = fieldNameFieldPair.getKey();
-        HoodieSchemaField field = fieldNameFieldPair.getValue();
-        HoodieSchema fieldSchema = field.schema().getNonNullType();
+      nonNullFieldSchemas.forEach(fieldNameSchemaPair -> {
+        String fieldName = fieldNameSchemaPair.getKey();
+        HoodieSchema fieldSchema = fieldNameSchemaPair.getValue();
         if (!isColumnTypeSupported(fieldSchema, 
Option.of(record.getRecordType()), indexVersion)) {
           return;
         }

Reply via email to