Re: [PR] [enhancement](segment) Support streaming page flush for ScalarColumnWriter [doris]

via GitHub Tue, 10 Mar 2026 01:22:48 -0700


github-actions[bot] commented on PR #61126:
URL: https://github.com/apache/doris/pull/61126#issuecomment-4029537318


   ## Code Review Summary
   
   **PR:** [enhancement](segment) Support streaming page flush for 
ScalarColumnWriter
   
   ### Overview
   This PR adds an opt-in `enable_streaming_page_flush` config (default: false) 
that causes `ScalarColumnWriter` to flush data pages to disk immediately when 
they become full in `finish_current_page()`, instead of caching them in memory 
until `write_data()`. The goal is to reduce memory consumption during segment 
writing.
   
   ### Critical Checkpoint Conclusions
   
   **1. Goal and correctness:** The code accomplishes its stated goal. The 
streaming path in `finish_current_page()` correctly writes pages to disk 
immediately, accumulates `_total_compressed_data_pages_size` and 
`_total_uncompressed_data_pages_size` incrementally, and the `write_data()` 
method correctly skips data pages (only writing the dictionary page) in 
streaming mode. The test proves write-then-read correctness for both INT 
(BIT_SHUFFLE) and VARCHAR (DICT_ENCODING) types, with both sequential and 
seek-based reading.
   
   **2. Modification scope:** The change is well-scoped — 5 files, focused on 
`ScalarColumnWriter` only. The refactoring of `collect_uncompressed_bytes` from 
a local lambda to a static member method is a clean improvement. Default-off 
config minimizes risk.
   
   **3. Concurrency:** Not applicable — `ScalarColumnWriter` is single-threaded 
within a segment write. The `_streaming_flush_enabled` flag is captured once at 
construction from a mutable global config, which is safe (each writer has a 
consistent view). No new locks or shared state introduced.
   
   **4. Lifecycle management:** No special lifecycle concerns. Pages are either 
cached in `_pages` (traditional) or written and destroyed immediately 
(streaming) — the two paths are mutually exclusive and gated by an immutable 
member bool.
   
   **5. Configuration:** `enable_streaming_page_flush` is declared as 
`DEFINE_mBool` (mutable at runtime). Since the value is snapshot into 
`_streaming_flush_enabled` at `ScalarColumnWriter` construction, runtime 
changes take effect for newly-created writers only, which is the correct 
behavior. No process restart needed to detect changes.
   
   **6. Incompatible changes / storage format:** None. Both modes produce 
identical on-disk page format (same `PagePointer` absolute offsets in ordinal 
index, same page serialization). The only difference is physical ordering of 
pages within the file — in horizontal `SegmentWriter` mode, streaming flush 
causes interleaved data pages across columns. This is safe because readers 
locate pages by absolute `PagePointer` offset, not sequential position. The PR 
documentation in the config comment correctly notes this: "may result in 
interleaved data pages across columns."
   
   **7. Parallel code paths:** `OffsetColumnWriter` (for array/map offset 
columns) inherits from `ScalarColumnWriter` and automatically gets streaming 
flush. This is safe — the `put_extra_info_in_page()` callback fires during page 
footer construction (before the write), and `_next_offset` is set before 
`finish_current_page()` is called. Composite writers (`ArrayColumnWriter`, 
`MapColumnWriter`, `StructColumnWriter`, `VariantColumnWriter`) delegate to 
leaf `ScalarColumnWriter` instances, so they all benefit automatically.
   
   **8. Error handling:** If `_write_data_page()` fails in streaming mode, 
pages already written remain in the partially-written file. This is safe: the 
file is never registered as a valid segment, and 
`BaseBetaRowsetWriter::~BaseBetaRowsetWriter()` cleans up partial files on 
failure.
   
   **9. `estimate_buffer_size()` semantics:** In streaming mode, `_data_size` 
stays 0 (since `_push_back_page` is never called), so the PR adds 
`_total_compressed_data_pages_size` to compensate. This is slightly different 
from the traditional mode estimate (actual compressed bytes vs. `page_body_size 
+ 20`), but the difference is negligible (a few bytes per page) and the 
estimate is only used for coarse threshold decisions (segment size limits, disk 
capacity checks). The streaming mode estimate is actually slightly more 
accurate.
   
   **10. Test coverage:** Good unit test coverage with 8 test cases covering: 
both modes × both data types (INT, VARCHAR), sequential and seek reads, small 
datasets (single page), and file size comparison between modes. Missing: no 
regression test or multi-column segment-level test (to verify interleaving 
works end-to-end), but acceptable for a default-off config at this stage.
   
   **11. Observability:** No new metrics or logging added. Acceptable given 
this is a config-gated feature with no new failure modes.
   
   **12. Performance:** The refactoring from a local lambda to a static method 
is neutral. In streaming mode, the per-page disk write adds I/O latency during 
`append_data()`, but this is the intended tradeoff (latency for memory 
savings). No redundant operations introduced.
   
   ### No blocking issues found.
   
   Minor observations (non-blocking):
   - The `DCHECK` assertions in the constructor (lines 483-490) should ideally 
be `DORIS_CHECK` per codebase standards, but these are pre-existing and not 
introduced by this PR.
   - Consider adding a regression test exercising the streaming path through a 
full segment write (multi-column) in a follow-up, to cover the interleaving 
scenario end-to-end.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [enhancement](segment) Support streaming page flush for ScalarColumnWriter [doris]

Reply via email to