Re: [I] Force compact in topK when we hit memory limit [datafusion]

via GitHub Mon, 29 Dec 2025 04:42:50 -0800


bharath-techie commented on issue #19386:
URL: https://github.com/apache/datafusion/issues/19386#issuecomment-3696425376


   Yes agreed on compact causing some performance impact, we also tried out one 
more approach of using jemalloc based memory pool like the one used in influxdb 
for accounting [ instead we directly used it as memory pool for allocations ] - 
that also seemed to work out well in initial testing and was able to overcome 
the multiple counting issue. Though we didn't do extensive testing.
   
   I'm happy to try out a working solution on top of #19501 as well for topK 
once its ready. I was already tracking the associated github issues and it 
looks promising :) 
   
   I still feel that force compact on topK instead of throwing error is still a 
good fallback solution. As atleast the query will go through after doing 
significant amount of work in groupby etc.
   
   Plus with your change, we might not end up force compacting in many 
scenarios as we'll be able to correctly account for memory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Force compact in topK when we hit memory limit [datafusion]

Reply via email to