ingaleniranjan365 opened a new pull request, #16743: URL: https://github.com/apache/iceberg/pull/16743
## Performance issue **File**: `core/src/main/java/org/apache/iceberg/SchemaUpdate.java:178` `deletes` was declared as `List<Integer>`, making `deletes.contains(id)` an O(n) scan. This field is checked once per column across every field in a schema during `apply()`, so on a wide schema the scan executes O(fields²) times. ## Fix Changed the type of `deletes` from `List<Integer>` to `Set<Integer>` (4 occurrences: declaration + `HashSet` construction + two call sites). `Set.contains()` is O(1). All other behaviour is identical — `deletes` is append-only and never iterated in order. ## Evidence Before: `List.contains()` — O(n) per call, called once per field during schema apply → O(fields²) total. After: `HashSet.contains()` — O(1) per call → O(fields) total. ## Validation - Test harness: `./gradlew :iceberg-core:test --tests "org.apache.iceberg.TestSchemaUpdate"` - Tests pass after fix: ✅ - Fix scope: domain-free, independent, 1 file / 6 lines 🌀 Magic applied with [Wibey VS Code Extension](https://wibey.walmart.com/code) 🪄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
