Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

via GitHub Mon, 15 Jul 2024 03:25:05 -0700


alamb commented on issue #11442:
URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2228171938


   ## Aggregate performance / memory use for high cardinality aggregates
   * https://github.com/apache/datafusion/issues/6937
   
   **What**: Improve Queries when the number of groups is very high (1 million+)
   **Why**: Queries when the number of groups is high are significantly slower 
than DuckDB and use substantially more memory. I think there is at least a 
factor of 2 of performance here
   **What is left**: There are ideas on 
https://github.com/apache/datafusion/issues/6937 but someone has to try them 
out, prototype / see if they would work and then productionize them
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

Reply via email to