wankunde opened a new pull request, #12233: URL: https://github.com/apache/gluten/pull/12233
## What changes are proposed in this pull request? Why this PR is needed? In `IcebergScanTransformer.typesMatch()`, the struct type matching logic creates temporary Iceberg `Schema` objects for every Spark field: ```scala new Schema(currentType.fields()).findField(...) new Schema(iceberg.fields()).findField(...) ``` This repeatedly rebuilds Iceberg schema indexes while checking historical schemas, which can become expensive for wide schemas or tables with many schema versions. In production thread dumps, this shows up in `Schema` / `IndexByName` / `HashMap` initialization during Iceberg scan planning. Changes in this PR: This change uses `Types.StructType.field(name)` and `Types.StructType.field(id)` directly when matching nested struct fields. `Types.StructType` already provides field lookup by name and id, so this avoids constructing temporary `Schema` objects inside the field loop while preserving the existing matching behavior: - find the current field by Spark field name - find the old schema field by Iceberg field id - keep allowing added columns - keep detecting renamed columns by comparing field names ## How was this patch tested? Test with exist UT ## Was this patch authored or co-authored using generative AI tooling? Generated-by: Codex GPT-5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
