ivankelly opened a new issue, #15028:
URL: https://github.com/apache/datafusion/issues/15028

   ### Describe the bug
   
   As discussed on discord, here's another external sort usecase that's failing.
   
   Repro:
   https://github.com/ivankelly/df-repro
   
   To run: 
   ```
   $ bash setup.sh # download the source data
   $ RUST_LOG=trace cargo run
   ...
   Error: Resources exhausted: Failed to allocate additional 1450451 bytes for 
ParquetSink(ArrowColumnWriter) with 62770337 bytes already allocated for this 
reservation - 1107184 bytes remain available for the total pool
   ```
   
   The code reads in a bunch of parquet files (889MB in total) and tries to 
sort and output to a single parquet file.
   Memory is limited to 100MB.
   Different batch sizes and target partitions doesn't help.
   
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to