stevenzwu opened a new pull request, #16559: URL: https://github.com/apache/iceberg/pull/16559
## Summary Reduces `TestStructuredStreamingRead3` parameter rows from 8 to 2 in v3.5 / v4.0 / v4.1, dropping the catalogs that don't add meaningful coverage for streaming-read semantics: - **testhive** (Hive, async=true) — Hive-metastore baseline. - **testrest** (REST, async=false) — REST is the OSS-strategic catalog. Removed: `testhadoop` (HadoopCatalog isn't recommended for production) and `spark_catalog` (SessionCatalog wrapper differences live in DDL/table-resolution paths, not streaming reads — both `SparkCatalog` and `SparkSessionCatalog` resolve to `SparkTable` once a table is identified as Iceberg, and streaming reads don't exercise the wrapper-specific code paths). The trim cuts each test class invocation count from **264 → 66 (75% reduction)**. `TestStructuredStreamingRead3` was the **highest-CPU test class in the Spark core CI job** (20.3% of total test CPU on `(17, 4.1, 2.13, core)`, ~931 CPU-sec out of 4595). Per-row coverage analysis shows no test method uses `assumeThat(catalogName)`, so no test gets silenced by the trim — every test still runs across both rows. ## Axis coverage — before | # | catalog | async | |---|---|---| | 1 | testhive | false | | 2 | testhive | true | | 3 | testhadoop | false | | 4 | testhadoop | true | | 5 | testrest | false | | 6 | testrest | true | | 7 | spark_catalog (Hive) | false | | 8 | spark_catalog (Hive) | true | | Axis | Values present (rows) | |---|---| | Catalog | testhive (1, 2) · testhadoop (3, 4) · testrest (5, 6) · spark_catalog (7, 8) | | async | false (1, 3, 5, 7) · true (2, 4, 6, 8) | ## Axis coverage — after | # | catalog | async | |---|---|---| | 1 | testhive | true | | 2 | testrest | false | | Axis | Values present (rows) | |---|---| | Catalog | testhive (1) · testrest (2) | | async | true (1) · false (2) | Both async values still tested; both strategic production catalogs still tested. Joint coverage of `(testhive, false)` and `(testrest, true)` is sacrificed — but no test in the class depends on those joint combinations specifically (no `assumeThat` on `catalogName` or stacked predicates). ## Design rationale - **Streaming read semantics aren't catalog-specific.** A streaming read on an Iceberg table goes through Spark's micro-batch source machinery, the Iceberg `SparkTable` DSv2 read path, and the `IncrementalScan` planning logic. The catalog backend is only involved at table-resolution time. Once the table is loaded, the streaming behavior is the same regardless of catalog. - **`testhadoop` dropped.** `HadoopCatalog` isn't a production target for Iceberg. The streaming tests don't exercise anything HadoopCatalog-specific. - **`spark_catalog` dropped.** The `SparkSessionCatalog` wrapper test is more valuable when it exercises code paths the wrapper actually intercepts (DDL routing, V1-vs-V2 fallback for non-Iceberg tables) — none of which streaming reads do. - **REST kept at `async=false`, Hive at `async=true`** — distributes both async values across both catalogs, no implicit "one catalog runs both async modes" preference. ## Test plan - [x] `spotlessCheck` passes on all 3 Spark versions. - [x] Local smoke run on Spark 4.1: `TestStructuredStreamingRead3` — **66 tests, 0 skipped, 0 failures, 0 errors** (down from 264 invocations originally; 75% reduction confirmed). - [x] Verified no test method uses `assumeThat(catalogName)` — `git grep "assumeThat"` in the file returned 0 matches, so the catalog axis trim cannot silence any test. - [ ] Full Spark CI run on this branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
