[PR] Add SerializedRowGroupWriter::append_column_from_pages and avoid a copy at row-group splice [arrow-rs]

via GitHub Tue, 02 Jun 2026 12:21:55 -0700


alamb opened a new pull request, #10052:
URL: https://github.com/apache/arrow-rs/pull/10052


   # Which issue does this PR close?
   
   None directly — a small, self-contained writer cleanup that also lays 
groundwork for the pluggable page-spilling work in #10020.
   
   # Rationale for this change
   
   When the Arrow `ArrowWriter` flushes a row group, each column chunk's 
encoded pages (buffered as a `Vec<Bytes>`) are spliced into the output file. 
Today that splice goes through a bespoke `ArrowColumnChunkReader` that 
implements `Read`, driven by `std::io::copy`:
   
   ```rust
   // arrow_writer: ArrowColumnChunkData implements ChunkReader/Read ...
   writer.append_column(&self.data, self.close)
   // file/writer.rs:
   let mut read = reader.get_read(src_offset)?.take(src_length);
   std::io::copy(&mut read, &mut self.buf)?;   // copies every byte through an 
8 KiB buffer
   ```
   
   `std::io::copy` reads into an internal stack buffer 
(`ArrowColumnChunkReader::read` does `copy_from_slice` into it) and then writes 
that buffer out — so every byte of every column chunk is copied **twice**, in 
~`len / 8 KiB` write calls. The `Bytes` blobs are already contiguous and owned; 
we can write each straight to the sink with a single `write_all`.
   
   # What changes are included in this PR?
   
   - **New public API** `SerializedRowGroupWriter::append_column_from_pages<I: 
IntoIterator<Item = Bytes>>(pages, close)`: splices an already-encoded column 
chunk supplied as in-order byte buffers (typically its serialized pages), 
writing each directly to the output. It is a lower-overhead alternative to 
`append_column` for callers that already hold the encoded column as owned 
`Bytes`.
   - The Arrow writer uses it and **drops the `ArrowColumnChunkReader` / 
`ChunkReader` / `Length` adapter** (~50 lines).
   - `append_column` (the `ChunkReader`-based API) keeps identical behavior; 
its preamble and the offset-remap epilogue are factored into shared 
`begin_appended_column` / `finish_appended_column` helpers used by both splice 
paths.
   - Drive-by fix: the splice length-mismatch error used a single-arg 
`general_err!`, which does **not** run `format!`, so it printed the literal 
text `{read_length} got {write_length}` rather than the values.
   
   # Are these changes tested?
   
   - Full `parquet` lib test suite passes (`--features "arrow async"`, 1169 
tests), plus the `arrow_writer_layout` integration test and the 
encryption-feature build/tests.
   - **Byte-identical output verified**: writing the same multi-row-group 
dictionary + string data on this branch vs. `main` produces identical files 
(`cmp` clean).
   
   # Are there any user-facing changes?
   
   One additive public method (`append_column_from_pages`). No breaking 
changes; default output is byte-identical.
   
   ## Performance note
   
   This removes a provable extra `memcpy` of each column chunk at splice. I did 
**not** see a measurable wall-clock change in the `writer_overhead` 
microbenchmark (within ±2–4% noise at 10 samples) — that benchmark writes ~1 
row per column, so there are essentially no page bytes to copy and it measures 
per-column metadata overhead, not the splice. The copy is also small relative 
to encode + compress for typical data. The win is reduced copying/syscalls 
proportional to column-chunk size, most relevant for large chunks and 
unbuffered sinks; happy to have the benchmark bot run the broader suite.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Add SerializedRowGroupWriter::append_column_from_pages and avoid a copy at row-group splice [arrow-rs]

Reply via email to