sollhui opened a new pull request, #62032:
URL: https://github.com/apache/doris/pull/62032

   ## Problem
   
   Several IO layer read buffers were allocated via `std::make_unique<char[]>` /
   `new char[]`, which bypasses Doris's memory tracking system 
(`MemTrackerLimiter`).
   These allocations are invisible to the query memory tracker, leading to
   under-reported memory usage and potential OOM surprises under concurrent 
S3/HDFS
   scans.
   
   Affected locations:
   - `PrefetchBuffer::_buf` in `buffered_reader` — the main S3/HDFS prefetch 
buffer
   - `HttpFileReader::_read_buffer` — the 1 MB HTTP read buffer
   - `HdfsFileSystem::download_impl` — the 1 MB copy buffer for remote→local 
download
   
   ## Solution
   
   Replace raw `unique_ptr<char[]>` allocations with `PODArray<char>`, which 
uses
   Doris's `Allocator<..., check_and_tracking_memory=true>` internally. Every
   `alloc`/`realloc`/`free` call goes through `consume_memory` / 
`release_memory`
   on the thread-local `MemTrackerLimiter`, so these buffers are now properly
   accounted for.
   
   Key behavioral notes:
   - `PODArray` default constructor allocates no memory, so the lazy-allocation
     optimization in `PrefetchBuffer` (introduced to reduce peak memory during 
TVF
     scans over many small S3 files) is fully preserved — the buffer is only
     allocated on the first actual prefetch call.
   - `PODArray` supports move semantics, so the existing move constructor of
     `PrefetchBuffer` works unchanged.
   
   ## Changes
   
   | File | Change |
   |------|--------|
   | `buffered_reader.h` | `_buf`: `unique_ptr<char[]>` → `PODArray<char>`; 
remove eager `new char[]` from constructor |
   | `buffered_reader.cpp` | Add lazy-alloc guard (`_buf.empty()` / 
`_buf.resize`); `.get()` → `.data()` for `PrefetchBuffer` |
   | `http_file_reader.h` | `_read_buffer`: `unique_ptr<char[]>` → 
`PODArray<char>` |
   | `http_file_reader.cpp` | `make_unique<char[]>` → `resize`; `.get()` → 
`.data()` |
   | `hdfs_file_system.cpp` | Local copy buffer: `new char[]` → 
`PODArray<char>` |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to