pmcgleenon commented on issue #17721:
URL: https://github.com/apache/datafusion/issues/17721#issuecomment-3476400608

   Now that there are multiple instance types for datafusion, I've created some 
[automation to run the clickbench 
tests](https://github.com/pmcgleenon/datafusion-clickbench-runner).   If anyone 
would like to try it out,  I would be interested in your feedback!
   
   Originally the ClickBench datafusion tests only used 1 instance type: 
`c6a.4xlarge`
   Recently the ClickHouse team added new instance types into the results for 
datafusion 
[c6a.xlarge](https://github.com/ClickHouse/ClickBench/commit/8ab83ad9c1745f71edd1105426babe32d41a7be8),
 
[c6a.2xlarge](https://github.com/ClickHouse/ClickBench/commit/c11483588a72b3943971e0550a62e689a138cc23)
 and 
[c8g.4xlarge](https://github.com/ClickHouse/ClickBench/commit/b247b2045583558412fff8c67d68fae0765ad71d)
   
   The instance type 
[`c6a.xlarge`](https://instances.vantage.sh/aws/ec2/c6a.xlarge) only has 8GB 
RAM (the clickbench dataset is 15GB)   and there are a number of issues with 
it, including 
   - datafusion compilation fails on this instance.  We can workaround this by 
using `brew install`
   - OOM errors happens during test execution, with results reported as null 
for several tests
   - the machine because unresponsive, with ssh and shell not working during 
test execution
   
   I see two options here:
   1.  report results for the `c6a.xlarge` and use the brew install workaround 
to get around the compilation issues. In this case some of the results will be 
null.   I didn't see a way to specify a particular datafusion version with brew 
install, so it will always pick up the latest version (currently 50.3.0)
   2.  remove `c6a.xlarge` from the results until datafusion becomes functional 
on the 8GB RAM machine.  We would have the option to compile datafusion with 
`target-cpu=native` to squeeze out some more performance
    
   I think we should avoid reporting results for `c6a.xlarge` until the issues 
are resolved, particularly since there is such a negative impact on the server 
when running the tests
   
   @alamb  @Dandandan  and everyone else - interested in your opinion on this 
on the way forward here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to