2010YOUY01 commented on PR #13090:
URL: https://github.com/apache/datafusion/pull/13090#issuecomment-2437436375

   Thank you @Dandandan @alamb , I have updated it as per the reviews.
   
   > Thanks @2010YOUY01 -- I agree with @Dandandan -- very nice 👌
   > 
   > I also plan to read the linked paper -- 🤓
   
   Really nice paper, we can implement the same benchmark and compare in the 
future 😄 
   They implemented a unified buffer pool for both table data cache and 
operator (like aggregation) intermediate results, to easily support spilling in 
various operators.
   I think they didn't mention any optimization specific to the spilling part 
of aggregation, and just use simple LRU policy in the buffer pool.
   Maybe there are some spilling and merging specific optimizations we can 
explore (all of memory-limited aggregate/SortMergeJoin/Sort can benefit from)
   
   > Also, are you interested in improving DataFusion's external aggregation 
capabilities? I think it is a non trivial gap at the moment and would be great 
to improve (and I would be interested in helping do so).
   > 
   > if you are, I can start organizing the work into some tickets to see if we 
can get some others to check it out too
   
   Yes, I'm start to look at related components now. Perhaps we can start with 
making memory-limited SQL queries more stable (e.g. more tests, make sure 
TPCH-SF1000 is able to run on laptop correctly), and later optimize.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to