alamb opened a new issue, #4248: URL: https://github.com/apache/arrow-datafusion/issues/4248
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like to ensure that DataFusion gets the correct answers for SQL queries (especially in tricky corner cases like the one described in https://github.com/apache/arrow-datafusion/issues/4211) From experience, both in DataFusion and in prior jobs, the effort required to maintain tests (both to add new tests as well as update existing tests) is substantial. Making it easier to add new tests and maintain existing ones will help us keep up velocity. Right now, we have two sql integration style tests: 1. the `integration` test from @Jimexist 🦾 https://github.com/apache/arrow-datafusion/tree/master/integration-tests: Runs a limited number of queries against data in both postgres and datafusion and compares the results * The `sql_integration` test https://github.com/apache/arrow-datafusion/tree/master/datafusion/core/tests/sql: test setup, execution, and verification is written in rust. The challenge with sql_integration test is that to add new tests or update existing ones, we need to change rust code and recompile, which takes a *loong* time Likewise, the integration test requires that the results are exactly the same as postgres which is not possible in all cases (like when testing for unsigned types, which postgres doesn't support, or testing some DataFusion specific thing) **Describe the solution you'd like** I would like some sort of data driven test to replace sql_integration You can see this style of test in duckdb: https://github.com/duckdb/duckdb/tree/master/test/sql/join My ideal solution would be to implement a runner (ideally the same as [SQLLogicTests](https://duckdb.org/dev/testing#sqllogictests) from DuckDB) 2. Using the same data file format as duckdb (will mean we could reuse their tests without much modification) 3. Start porting as many of the tests in sql_integration over to this new format as possible) I implemented a impler version of this approach in https://github.com/influxdata/influxdb_iox/blob/main/query_tests/README.md which runs sql queries from a file and compares the result to known output. I think the duckdb way is superior **Describe alternatives you've considered** Leave things the same **Additional context** Add any other context or screenshots about the feature request here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
