andygrove commented on PR #21508: URL: https://github.com/apache/datafusion/pull/21508#issuecomment-4215456735
> I would say if we have github bot action, similar to `run benchmarks` on the the PR would help remove the local testing part. How this script is planned to be called? It would be nice to eventually add a GitHub workflow to run this, but for now, probably best just to make the script available for people to run. Many of the tests are written in such a way that we cannot support them in PySpark, which makes this quite challenging. The Comet approach is much nicer, but there is no way in this repo to actually run the DF expressions from within Spark, so we cannot use Spark SQL for the tests. I suppose we could update the sql parser crate to support Spark SQL and update the planner to support using Spark expressions and then the tests could be written in Spark SQL. Sounds like a lot of work though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
