sollhui opened a new pull request, #61821:
URL: https://github.com/apache/doris/pull/61821

   ## Problem
   
   During import/load, file format readers accumulate rows into a block until 
`batch_size`
   (row count) is reached. When individual rows are large (e.g., wide schemas, 
complex nested
   types), a single batch can exceed hundreds of MBs, causing excessive memory 
pressure.
   
   ## Solution
   
   Add a new BE config `load_reader_max_block_bytes` (default **200MB**, set to 
`0` to disable)
   and enforce it across all load file format readers:
   
   | Reader | Mechanism |
   |---|---|
   | `CsvReader` | Track cumulative line bytes in the read loop; break early 
when limit reached |
   | `NewJsonReader` | Check `block->bytes()` in the loop guard before reading 
each JSON object |
   | `ParquetReader` | After each `next_batch()`, adaptively reduce 
`_batch_size` based on observed bytes/row |
   | `OrcReader` | Same adaptive reduction + recreate `_batch` with the new 
size for subsequent reads |
   
   ## Config
   // be.conf
   load_reader_max_block_bytes=209715200   # 200MB (default)
   load_reader_max_block_bytes=0           # disable limit
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to