I agree with Stamatis that this has a similar “shape” to Quidem. I’d be happy 
to host the project under github.com/hydromatic. (If the maven group is 
net.hydromatic I can publish artifacts to Maven Central and Calcite could 
depend on those artifacts.)

Regarding the frequency of testing. If we add it to CI and (say) 5% of the 
tests fail, I would find that demoralizing, even though passing 95% of the 
tests is actually a great achievement. So I would only deploy it as part of CI 
if there is a way to exclude failing tests.

If the SqlLogicTest tool were defined in another repo, then there could be a 
Calcite module under plus [1] similar to TpchTest.

Julian

[1] https://github.com/apache/calcite/tree/main/plus 



> On Apr 17, 2023, at 1:58 AM, Stamatis Zampetakis <[email protected]> wrote:
> 
> Hey Mihai,
> 
> Thanks for starting this discussion!
> 
> Let's focus on the first question for now:
> 
> Q1: Should the new slt module under PR-3145 [1] become part of Calcite
> repo or get its own?
> 
> For those who have not followed the discussion under the CALCITE-5615
> [2] let me try to summarize a few things as per my understanding;
> Mihai can amend/correct things if necessary.
> 
> The new slt module resembles a port of sqllogictest utility [3] to
> Java. It can parse and understand the test-script format used in
> sqllogictest and can run this scripts over JDBC compliant databases.
> It also accounts for extensions for Java engines without a JDBC
> interface.
> 
> From my perspective, the code in [1] could perfectly stand on its own
> in a separate repo; there are already ports of sqllogictest in other
> languages such as Rust [4] and the latter appears to be quite popular.
> The sqllocitest parser/runner presents some similarities with the
> Quidem [5] executor that we are using for certain tests in Calcite.
> The Quidem project has its own repo although we are making use of it
> in Calcite.
> If it becomes a separate repo then the test scripts could also become
> part of the project making it more self-contained.
> 
> On the other hand, we already have a testkit module in Calcite so
> bringing in new modules for testing purposes is relevant so why not
> slt as well. If it becomes part of Calcite it can get more visibility
> and facilitate maintenance since more people would be able to review
> and merge changes (not only Mihai).
> 
> Since we are talking about a new module I would like to see some more
> people share their opinion on the topic before I continue the review.
> 
> Best,
> Stamatis
> 
> [1] https://github.com/apache/calcite/pull/3145
> [2] https://issues.apache.org/jira/browse/CALCITE-5615
> [3] https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki
> [4] https://github.com/risinglightdb/sqllogictest-rs
> [5] https://github.com/julianhyde/quidem
> 
> 
> 
> On Sat, Apr 15, 2023 at 11:31 AM Michael Mior <[email protected]> wrote:
>> 
>> Very cool! One approach could be to add set these tests to run periodically
>> (daily/weekly) as opposed to being part of the CI pipeline. That way we
>> still have a mechanism to keep tabs on bugs but the whole build isn't
>> slow/broken until this is fixed.
>> 
>> On Fri, Apr 14, 2023, 15:20 Mihai Budiu <[email protected]> wrote:
>> 
>>> Hello all,
>>> 
>>> I have submitted a PR for Calcite with a standalone executable that runs
>>> the Sql Logic Test suite of 7+ million tests from sqlite.
>>> 
>>> This is the JIRA case: https://issues.apache.org/jira/browse/CALCITE-5615
>>> And this is the PR: https://github.com/apache/calcite/pull/3145
>>> 
>>> As Stamatis pointed out, the PR isn't really specific to Calcite, it is a
>>> general framework in Java to run these tests on any JDBC compliant
>>> executor. So a question is whether this belongs to the Calcite project, or
>>> some place else. sqlite is a C project, I didn't see any Java in their
>>> source tree.
>>> 
>>> Please note that SQLite is in the public domain, so their licensing terms
>>> are not an obstacle to using the test scripts.
>>> 
>>> The submitted code runs Calcite in its default configuration, but the
>>> intent is for other projects that build Calcite-based compilers to be able
>>> to test them by subclassing the "TestExecutors". In our own project (
>>> https://github.com/vmware/sql-to-dbsp-compiler) we have done exactly that,
>>> and we are not using the JDBC API.
>>> 
>>> The testsuite does find bugs in Calcite, both crashes and incorrect
>>> results. So I think it's usefulness is not debated.
>>> 
>>> The second question is about the packaging of this program; right now it
>>> has a main() entry point and it prints the results to stderr for human
>>> consumption and triage. It is not clear to me how it should be inserted in
>>> a CI infrastructure, since running all 7 million tests could take a long
>>> time. One possible extension would be to have the program generate a
>>> regression test for Calcite for each bug it finds, but I haven't
>>> implemented this feature yet (and many failures could be due to the same
>>> bug). But even that mode would not naturally integrate in a CI
>>> infrastructure.
>>> 
>>> A simple possibility is for me to just publish the code as an independent
>>> project on github with an MIT license (the code is derived from our
>>> MIT-licensed project) and just advertise it to the Calcite community.
>>> 
>>> I would very much appreciate guidance.
>>> 
>>> Mihai Budiu
>>> 

Reply via email to