github-actions[bot] commented on PR #61126: URL: https://github.com/apache/doris/pull/61126#issuecomment-4029537318
## Code Review Summary **PR:** [enhancement](segment) Support streaming page flush for ScalarColumnWriter ### Overview This PR adds an opt-in `enable_streaming_page_flush` config (default: false) that causes `ScalarColumnWriter` to flush data pages to disk immediately when they become full in `finish_current_page()`, instead of caching them in memory until `write_data()`. The goal is to reduce memory consumption during segment writing. ### Critical Checkpoint Conclusions **1. Goal and correctness:** The code accomplishes its stated goal. The streaming path in `finish_current_page()` correctly writes pages to disk immediately, accumulates `_total_compressed_data_pages_size` and `_total_uncompressed_data_pages_size` incrementally, and the `write_data()` method correctly skips data pages (only writing the dictionary page) in streaming mode. The test proves write-then-read correctness for both INT (BIT_SHUFFLE) and VARCHAR (DICT_ENCODING) types, with both sequential and seek-based reading. **2. Modification scope:** The change is well-scoped — 5 files, focused on `ScalarColumnWriter` only. The refactoring of `collect_uncompressed_bytes` from a local lambda to a static member method is a clean improvement. Default-off config minimizes risk. **3. Concurrency:** Not applicable — `ScalarColumnWriter` is single-threaded within a segment write. The `_streaming_flush_enabled` flag is captured once at construction from a mutable global config, which is safe (each writer has a consistent view). No new locks or shared state introduced. **4. Lifecycle management:** No special lifecycle concerns. Pages are either cached in `_pages` (traditional) or written and destroyed immediately (streaming) — the two paths are mutually exclusive and gated by an immutable member bool. **5. Configuration:** `enable_streaming_page_flush` is declared as `DEFINE_mBool` (mutable at runtime). Since the value is snapshot into `_streaming_flush_enabled` at `ScalarColumnWriter` construction, runtime changes take effect for newly-created writers only, which is the correct behavior. No process restart needed to detect changes. **6. Incompatible changes / storage format:** None. Both modes produce identical on-disk page format (same `PagePointer` absolute offsets in ordinal index, same page serialization). The only difference is physical ordering of pages within the file — in horizontal `SegmentWriter` mode, streaming flush causes interleaved data pages across columns. This is safe because readers locate pages by absolute `PagePointer` offset, not sequential position. The PR documentation in the config comment correctly notes this: "may result in interleaved data pages across columns." **7. Parallel code paths:** `OffsetColumnWriter` (for array/map offset columns) inherits from `ScalarColumnWriter` and automatically gets streaming flush. This is safe — the `put_extra_info_in_page()` callback fires during page footer construction (before the write), and `_next_offset` is set before `finish_current_page()` is called. Composite writers (`ArrayColumnWriter`, `MapColumnWriter`, `StructColumnWriter`, `VariantColumnWriter`) delegate to leaf `ScalarColumnWriter` instances, so they all benefit automatically. **8. Error handling:** If `_write_data_page()` fails in streaming mode, pages already written remain in the partially-written file. This is safe: the file is never registered as a valid segment, and `BaseBetaRowsetWriter::~BaseBetaRowsetWriter()` cleans up partial files on failure. **9. `estimate_buffer_size()` semantics:** In streaming mode, `_data_size` stays 0 (since `_push_back_page` is never called), so the PR adds `_total_compressed_data_pages_size` to compensate. This is slightly different from the traditional mode estimate (actual compressed bytes vs. `page_body_size + 20`), but the difference is negligible (a few bytes per page) and the estimate is only used for coarse threshold decisions (segment size limits, disk capacity checks). The streaming mode estimate is actually slightly more accurate. **10. Test coverage:** Good unit test coverage with 8 test cases covering: both modes × both data types (INT, VARCHAR), sequential and seek reads, small datasets (single page), and file size comparison between modes. Missing: no regression test or multi-column segment-level test (to verify interleaving works end-to-end), but acceptable for a default-off config at this stage. **11. Observability:** No new metrics or logging added. Acceptable given this is a config-gated feature with no new failure modes. **12. Performance:** The refactoring from a local lambda to a static method is neutral. In streaming mode, the per-page disk write adds I/O latency during `append_data()`, but this is the intended tradeoff (latency for memory savings). No redundant operations introduced. ### No blocking issues found. Minor observations (non-blocking): - The `DCHECK` assertions in the constructor (lines 483-490) should ideally be `DORIS_CHECK` per codebase standards, but these are pre-existing and not introduced by this PR. - Consider adding a regression test exercising the streaming path through a full segment write (multi-column) in a follow-up, to cover the interleaving scenario end-to-end. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
