alamb opened a new pull request, #10051:
URL: https://github.com/apache/arrow-rs/pull/10051

   # Which issue does this PR close?
   
   Follow-up / guard test related to the discussion on #10020 (pluggable page 
spilling / `PageStore`).
   
   # Rationale for this change
   
   While reviewing #10020 I noticed that the deferred-dictionary-ordering 
change makes the `ArrowWriter` emit data pages before the dictionary page, 
which reorders the `page_encoding_stats` list so the `DICTIONARY_PAGE` entry 
lands **last** instead of first. The on-disk page layout is still 
dictionary-first (the splice rewrites the page offsets), and round-trips are 
unaffected, but the emitted metadata is no longer byte-identical to `main` for 
dictionary columns.
   
   There is currently **no test** asserting the order of `page_encoding_stats`, 
so that change is silent. This PR adds one.
   
   The test:
   - writes a low-cardinality (dictionary-encoded) column via `ArrowWriter`,
   - reads the **full** encoding stats via 
`ParquetMetaDataOptions::with_encoding_stats_as_mask(false)` (the default 
reader collapses them to a bitmask, which discards the ordering under test),
   - asserts the `DICTIONARY_PAGE` encoding stat precedes the `DATA_PAGE` stats.
   
   It **passes on `main`** and is intended to **fail** against the #10020 
branch, documenting the dictionary-first ordering as an explicit invariant so 
any future reordering is caught.
   
   # What changes are included in this PR?
   
   A single new unit test, 
`dictionary_page_encoding_stats_lists_dictionary_first`, in 
`parquet/src/arrow/arrow_writer/mod.rs`. No production code changes.
   
   # Are these changes tested?
   
   The change *is* a test. Verified it passes on `main`.
   
   # Are there any user-facing changes?
   
   No.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to