c-thiel commented on code in PR #2188:
URL: https://github.com/apache/iceberg-rust/pull/2188#discussion_r3357815396


##########
crates/integration_tests/tests/read_variant.rs:
##########


Review Comment:
   I had put these in `integration_tests` because they were genuinely Spark 
integration tests (reading real Spark-written variant data end-to-end through 
the REST catalog), and I wasn't aware we're trying to phase that suite out. Now 
that I know, I've moved them.
   
   Mitigation: the coverage is now in `arrow/reader/projection.rs` as 
self-contained unit tests over synthetic variant Parquet (full scan, 
variant-only, sibling-only, nested-in-struct) — they drive `ArrowReader` 
end-to-end (projection mask + decode), assert the id values and the variant 
metadata/value bytes round-trip exactly. `read_variant.rs` is deleted and the 
variant tables are removed from `dev/spark/provision.py`, so the PR no longer 
touches Spark provisioning at all.
   
   The one thing we lose is the real cross-engine interop check. If we want 
that back later it's a single-commit revert, but I think the unit tests cover 
the reader logic that actually mattered here.
   
   Longer term a cleaner interop source could be apache/parquet-testing 
(engine-agnostic, spec-canonical variant vectors) rather than Spark — I'd wire 
that in when we add variant value decoding / take on the #2546 
annotation+shredding work, since that's what the corpus actually exercises.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to