sollhui opened a new pull request, #61821: URL: https://github.com/apache/doris/pull/61821
## Problem During import/load, file format readers accumulate rows into a block until `batch_size` (row count) is reached. When individual rows are large (e.g., wide schemas, complex nested types), a single batch can exceed hundreds of MBs, causing excessive memory pressure. ## Solution Add a new BE config `load_reader_max_block_bytes` (default **200MB**, set to `0` to disable) and enforce it across all load file format readers: | Reader | Mechanism | |---|---| | `CsvReader` | Track cumulative line bytes in the read loop; break early when limit reached | | `NewJsonReader` | Check `block->bytes()` in the loop guard before reading each JSON object | | `ParquetReader` | After each `next_batch()`, adaptively reduce `_batch_size` based on observed bytes/row | | `OrcReader` | Same adaptive reduction + recreate `_batch` with the new size for subsequent reads | ## Config // be.conf load_reader_max_block_bytes=209715200 # 200MB (default) load_reader_max_block_bytes=0 # disable limit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
