sollhui opened a new pull request, #62032:
URL: https://github.com/apache/doris/pull/62032
## Problem
Several IO layer read buffers were allocated via `std::make_unique<char[]>` /
`new char[]`, which bypasses Doris's memory tracking system
(`MemTrackerLimiter`).
These allocations are invisible to the query memory tracker, leading to
under-reported memory usage and potential OOM surprises under concurrent
S3/HDFS
scans.
Affected locations:
- `PrefetchBuffer::_buf` in `buffered_reader` — the main S3/HDFS prefetch
buffer
- `HttpFileReader::_read_buffer` — the 1 MB HTTP read buffer
- `HdfsFileSystem::download_impl` — the 1 MB copy buffer for remote→local
download
## Solution
Replace raw `unique_ptr<char[]>` allocations with `PODArray<char>`, which
uses
Doris's `Allocator<..., check_and_tracking_memory=true>` internally. Every
`alloc`/`realloc`/`free` call goes through `consume_memory` /
`release_memory`
on the thread-local `MemTrackerLimiter`, so these buffers are now properly
accounted for.
Key behavioral notes:
- `PODArray` default constructor allocates no memory, so the lazy-allocation
optimization in `PrefetchBuffer` (introduced to reduce peak memory during
TVF
scans over many small S3 files) is fully preserved — the buffer is only
allocated on the first actual prefetch call.
- `PODArray` supports move semantics, so the existing move constructor of
`PrefetchBuffer` works unchanged.
## Changes
| File | Change |
|------|--------|
| `buffered_reader.h` | `_buf`: `unique_ptr<char[]>` → `PODArray<char>`;
remove eager `new char[]` from constructor |
| `buffered_reader.cpp` | Add lazy-alloc guard (`_buf.empty()` /
`_buf.resize`); `.get()` → `.data()` for `PrefetchBuffer` |
| `http_file_reader.h` | `_read_buffer`: `unique_ptr<char[]>` →
`PODArray<char>` |
| `http_file_reader.cpp` | `make_unique<char[]>` → `resize`; `.get()` →
`.data()` |
| `hdfs_file_system.cpp` | Local copy buffer: `new char[]` →
`PODArray<char>` |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]