c-thiel commented on code in PR #2188: URL: https://github.com/apache/iceberg-rust/pull/2188#discussion_r3357815396
########## crates/integration_tests/tests/read_variant.rs: ########## Review Comment: I had put these in `integration_tests` because they were genuinely Spark integration tests (reading real Spark-written variant data end-to-end through the REST catalog), and I wasn't aware we're trying to phase that suite out. Now that I know, I've moved them. Mitigation: the coverage is now in `arrow/reader/projection.rs` as self-contained unit tests over synthetic variant Parquet (full scan, variant-only, sibling-only, nested-in-struct) — they drive `ArrowReader` end-to-end (projection mask + decode), assert the id values and the variant metadata/value bytes round-trip exactly. `read_variant.rs` is deleted and the variant tables are removed from `dev/spark/provision.py`, so the PR no longer touches Spark provisioning at all. The one thing we lose is the real cross-engine interop check. If we want that back later it's a single-commit revert, but I think the unit tests cover the reader logic that actually mattered here. Longer term a cleaner interop source could be apache/parquet-testing (engine-agnostic, spec-canonical variant vectors) rather than Spark — I'd wire that in when we add variant value decoding / take on the #2546 annotation+shredding work, since that's what the corpus actually exercises. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
