cetra3 commented on PR #20159:
URL: https://github.com/apache/datafusion/pull/20159#issuecomment-3857764965

   OK I have tried to get some benchmark results out using disk spilling and 
the `tpch` benchmark.  Setting the memory limit to 1G I can successfully run 
this on this branch all the time.  On main it fails a lot with the arrow error 
but managed to get it to work once or twice.  However it just looks like it's 
`Query 21` where any spilling happens.
   
   On this branch:
   
   ```
   Query 21 iteration 0 took 1596.5 ms and returned 100 rows
   Query 21 iteration 1 took 2269.5 ms and returned 100 rows
   Query 21 iteration 2 took 2515.2 ms and returned 100 rows
   Query 21 iteration 3 took 2907.9 ms and returned 100 rows
   Query 21 iteration 4 took 3504.3 ms and returned 100 rows
   Query 21 avg time: 2558.70 ms
   
   ``` 
   
   On `main` the best I got was:
   
   ```
   Query 21 iteration 0 took 3121.8 ms and returned 100 rows
   Query 21 iteration 1 took 2731.7 ms and returned 100 rows
   Query 21 iteration 2 took 3314.0 ms and returned 100 rows
   Query 21 iteration 3 took 4073.1 ms and returned 100 rows
   Query 21 iteration 4 took 5101.1 ms and returned 100 rows
   Query 21 avg time: 3668.34 ms
   ```
   
   I've attached the JSON of both runs:
   
   [main.json](https://github.com/user-attachments/files/25117450/main.json)
   
[fix_spill_read_underrun.json](https://github.com/user-attachments/files/25117448/fix_spill_read_underrun.json)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to