[GitHub] [arrow-datafusion] alamb opened a new issue, #4248: Make a data driven SQL testing tool (so we can reuse duckdb test suite, example)

GitBox Wed, 16 Nov 2022 12:08:34 -0800


alamb opened a new issue, #4248:
URL: https://github.com/apache/arrow-datafusion/issues/4248


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   I would like to ensure that DataFusion gets the correct answers for SQL 
queries (especially in tricky corner cases like the one described in 
https://github.com/apache/arrow-datafusion/issues/4211)
   
   From experience, both in DataFusion and in prior jobs, the effort required 
to maintain tests (both to add new tests as well as update existing tests) is 
substantial. Making it easier to add new tests and maintain existing ones will 
help us keep up velocity. 
   
   Right now, we have two sql integration style tests:
   
   1. the `integration` test from @Jimexist 🦾  
https://github.com/apache/arrow-datafusion/tree/master/integration-tests: Runs 
a limited number of queries against data in both postgres and datafusion and 
compares the results
   * The `sql_integration` test 
https://github.com/apache/arrow-datafusion/tree/master/datafusion/core/tests/sql:
 test setup, execution, and verification is written in rust. 
   
   The challenge with sql_integration test is that to add new tests or update 
existing ones, we need to change rust code and recompile, which takes a *loong* 
time
   
   Likewise, the integration test requires that the results are exactly the 
same as postgres which is not possible in all cases (like when testing for 
unsigned types, which postgres doesn't support, or testing some DataFusion 
specific thing)
   
   
   
   **Describe the solution you'd like**
   I would like some sort of data driven test to replace sql_integration
   
   You can see this style of test in duckdb:  
https://github.com/duckdb/duckdb/tree/master/test/sql/join
   
   My ideal solution would be to implement a runner (ideally the same as 
[SQLLogicTests](https://duckdb.org/dev/testing#sqllogictests) from DuckDB)
   2. Using the same data file format as duckdb (will mean we could reuse their 
tests without much modification)
   3. Start porting as many of the tests in sql_integration over to this new 
format as possible)
   
   I implemented a impler version of this approach in 
https://github.com/influxdata/influxdb_iox/blob/main/query_tests/README.md 
which runs sql queries from a file and compares the result to known output.  I 
think the duckdb way is superior
   
   **Describe alternatives you've considered**
   Leave things the same
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue, #4248: Make a data driven SQL testing tool (so we can reuse duckdb test suite, example)

Reply via email to