linhr commented on issue #17045: URL: https://github.com/apache/datafusion/issues/17045#issuecomment-3173197643
I brought up a similar idea in <https://github.com/apache/datafusion/issues/15914#issuecomment-2935366003>. The idea was that we could use the `pyspark` library to generate SLT files and we only manually define the inputs. The SLT files will be validated in CI when they change to ensure that the output actually matches the Spark behavior. In general, I agree that some CI setup (triggered only when needed) leveraging the original Spark implementation would be helpful to ensure the correctness of test suites for `datafusion-spark`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org