Re: [PR] Adding benchmark for external aggregation [datafusion]

via GitHub Fri, 25 Oct 2024 03:28:20 -0700


2010YOUY01 commented on PR #13090:
URL: https://github.com/apache/datafusion/pull/13090#issuecomment-2437436375


   Thank you @Dandandan @alamb , I have updated it as per the reviews.
   
   > Thanks @2010YOUY01 -- I agree with @Dandandan -- very nice 👌
   > 
   > I also plan to read the linked paper -- 🤓
   
   Really nice paper, we can implement the same benchmark and compare in the 
future 😄 
   They implemented a unified buffer pool for both table data cache and 
operator (like aggregation) intermediate results, to easily support spilling in 
various operators.
   I think they didn't mention any optimization specific to the spilling part 
of aggregation, and just use simple LRU policy in the buffer pool.
   Maybe there are some spilling and merging specific optimizations we can 
explore (all of memory-limited aggregate/SortMergeJoin/Sort can benefit from)
   
   > Also, are you interested in improving DataFusion's external aggregation 
capabilities? I think it is a non trivial gap at the moment and would be great 
to improve (and I would be interested in helping do so).
   > 
   > if you are, I can start organizing the work into some tickets to see if we 
can get some others to check it out too
   
   Yes, I'm start to look at related components now. Perhaps we can start with 
making memory-limited SQL queries more stable (e.g. more tests, make sure 
TPCH-SF1000 is able to run on laptop correctly), and later optimize.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Adding benchmark for external aggregation [datafusion]

Reply via email to