MrPowers commented on issue #13548:
URL: https://github.com/apache/datafusion/issues/13548#issuecomment-2504404645

   The h2o benchmarks are run on a Intel(R) Xeon(R) Platinum 8375C CPU @ 
2.90GHz machine with 128 cores and 250 GB of RAM.
   
   DataFusion groupby queries perform well on the 100 million row dataset (~5GB 
of data in a CSV file):
   
   <img width="789" alt="Screenshot 2024-11-27 at 12 10 41 PM" 
src="https://github.com/user-attachments/assets/a50d9dda-6405-44de-bb99-65c100dc0078";>
   
   Some don't run with the 1 billion row dataset (~50GB of data in an 
uncompressed CSV file):
   
   <img width="802" alt="Screenshot 2024-11-27 at 12 12 20 PM" 
src="https://github.com/user-attachments/assets/03414d79-3103-4f13-8ff2-7eec325333d2";>
   
   I am using a M3 Macbook with 16 GB of RAM.  How much RAM does your machine 
have?  Perhaps DataFusion only struggles with query 9 when the machine doesn't 
have lots of extra RAM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to