alamb opened a new pull request, #10051: URL: https://github.com/apache/arrow-rs/pull/10051
# Which issue does this PR close? Follow-up / guard test related to the discussion on #10020 (pluggable page spilling / `PageStore`). # Rationale for this change While reviewing #10020 I noticed that the deferred-dictionary-ordering change makes the `ArrowWriter` emit data pages before the dictionary page, which reorders the `page_encoding_stats` list so the `DICTIONARY_PAGE` entry lands **last** instead of first. The on-disk page layout is still dictionary-first (the splice rewrites the page offsets), and round-trips are unaffected, but the emitted metadata is no longer byte-identical to `main` for dictionary columns. There is currently **no test** asserting the order of `page_encoding_stats`, so that change is silent. This PR adds one. The test: - writes a low-cardinality (dictionary-encoded) column via `ArrowWriter`, - reads the **full** encoding stats via `ParquetMetaDataOptions::with_encoding_stats_as_mask(false)` (the default reader collapses them to a bitmask, which discards the ordering under test), - asserts the `DICTIONARY_PAGE` encoding stat precedes the `DATA_PAGE` stats. It **passes on `main`** and is intended to **fail** against the #10020 branch, documenting the dictionary-first ordering as an explicit invariant so any future reordering is caught. # What changes are included in this PR? A single new unit test, `dictionary_page_encoding_stats_lists_dictionary_first`, in `parquet/src/arrow/arrow_writer/mod.rs`. No production code changes. # Are these changes tested? The change *is* a test. Verified it passes on `main`. # Are there any user-facing changes? No. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
