Wenjun7J opened a new pull request, #16208: URL: https://github.com/apache/iceberg/pull/16208
<img width="2830" height="1180" alt="image" src="https://github.com/user-attachments/assets/ca838875-516f-415d-a610-b19dadd2620a" /> ## What is changed This change avoids rebuilding the same `PartitionData` Avro schema for every partition row when scanning the `partitions` metadata table. Instead of creating a fresh `PartitionData(partitionType)` for each partition value, `PartitionsTable` now creates one `PartitionData` template per scan and reuses it through `copyFor(key)`. A regression test is also added to verify that partition rows produced within the same scan reuse the same underlying Avro schema instance. ## Why `PartitionsTable` currently constructs partition rows like this: - create `PartitionData(partitionType)` - convert partition type to Avro schema - copy the partition key into the new object When a table has many partition values, this repeats the same schema conversion over and over again, creating heavy allocation pressure in: - `PartitionData.partitionDataSchema` - `AvroSchemaUtil.convert` - `TypeToSchema$WithTypeToName.struct` This is especially visible for wide partition specs and large metadata table scans. ### External reproduction Used a standalone repro app that scans the Iceberg `partitions` metadata table for a table with: - 20,000 partition values - 4 partition columns - repeated full partitionsTable scans ``` try (CloseableIterable<FileScanTask> tasks = partitionsTable.newScan().planFiles()) { for (FileScanTask task : tasks) { try (CloseableIterable<StructLike> rows = task.asDataTask().rows()) { for (StructLike row : rows) { StructProjection partitionData = row.get(0, StructProjection.class); if (partitionData == null) { throw new IllegalStateException("Partition row returned null partition data"); } partitionRows++; } } } } ``` #### Before fix (origin/main) - Average wall clock time: 12.71s - Average max RSS: 5,938,604 KB (~5.66 GiB) #### After fix - Average wall clock time: 5.24s - Average max RSS: 1,483,155 KB (~1.41 GiB) #### Improvement - Wall clock time reduced by 58.8% - Max RSS reduced by 75.0% -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
