2010YOUY01 commented on PR #13090: URL: https://github.com/apache/datafusion/pull/13090#issuecomment-2437436375
Thank you @Dandandan @alamb , I have updated it as per the reviews. > Thanks @2010YOUY01 -- I agree with @Dandandan -- very nice 👌 > > I also plan to read the linked paper -- 🤓 Really nice paper, we can implement the same benchmark and compare in the future 😄 They implemented a unified buffer pool for both table data cache and operator (like aggregation) intermediate results, to easily support spilling in various operators. I think they didn't mention any optimization specific to the spilling part of aggregation, and just use simple LRU policy in the buffer pool. Maybe there are some spilling and merging specific optimizations we can explore (all of memory-limited aggregate/SortMergeJoin/Sort can benefit from) > Also, are you interested in improving DataFusion's external aggregation capabilities? I think it is a non trivial gap at the moment and would be great to improve (and I would be interested in helping do so). > > if you are, I can start organizing the work into some tickets to see if we can get some others to check it out too Yes, I'm start to look at related components now. Perhaps we can start with making memory-limited SQL queries more stable (e.g. more tests, make sure TPCH-SF1000 is able to run on laptop correctly), and later optimize. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org