Baunsgaard opened a new pull request, #16696: URL: https://github.com/apache/iceberg/pull/16696
## What `RandomData` (Spark test helper) sized every generated list and map with `random.nextInt(20)`. Because the bound is applied at *every* nesting level, it multiplies for deeply-nested schemas. The worst case is `AvroDataTestBase.testMixedTypes`, which embeds the full ~19-field primitive struct two-to-three levels deep across five fields — each run generates well over a million leaf values, so the test cost is dominated by random-data volume rather than the read/write code paths being exercised. This replaces the hard-coded `20` with a named constant `MAX_COLLECTION_SIZE = 10` in the Spark 3.5 / 4.0 / 4.1 test copies. ## Why `testMixedTypes` is the single most expensive test method in the `iceberg-spark` core suite, appearing at the top of every format read/write test class. The collection size has no bearing on coverage — the schemas, types, and nesting structures under test are identical regardless of how many elements each collection holds — so this is pure scaffolding overhead. ## Impact Measured locally (JDK 17, Spark 3.5 core), `testMixedTypes` per class, single-threaded: | Class | before | after | |---|---:|---:| | TestAvroDataFrameWrite | 24.1s | 7.8s | | TestParquetDataFrameWrite | 20.0s | 3.6s | | TestORCDataFrameWrite | 19.9s | 3.4s | | TestParquetScan | 17.7s | 2.6s | | TestParquetVectorizedScan | 17.5s | 2.3s | | TestAvroScan | 17.5s | 2.4s | Collections still hold up to nine elements, preserving data variety. ## Testing `./gradlew :iceberg-spark:iceberg-spark-3.5_2.13:test` — **5,084 tests, 0 failures** (identical pass/skip counts to before the change). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
