Re: [DISCUSSION] Framework for SQL performance regressions detection.

Igor Seliverstov Fri, 22 May 2020 12:47:29 -0700

Great idea I think.

Can we also use the tool to compare, let's say, H2 indexing against Calcite
based one to detect possible issues when the new engine acts worse than H2?


Regards
Igor

пт, 22 мая 2020 г., 22:36 Denis Magda <[email protected]>:

> Hi Roman,
>
> +1 for sure. On a side note, should we create a separate ASF/Git repository
> for the project? Not sure we need to put the suite in the main Ignite repo.
>
> -
> Denis
>
>
> On Fri, May 22, 2020 at 8:54 AM Roman Kondakov <[email protected]
> >
> wrote:
>
> > Hi everybody!
> >
> > Currently Ignite doesn't have an ability to detect SQL performance
> > regressions between different versions. We have a Yardstick benchmark
> > module, but it has several drawbacks:
> > - it doesn't compare different Ignite versions
> > - it doesn't check the query result
> > - it doesn't have an ability to execute randomized SQL queries (aka
> > fuzzy testing)
> >
> > So, Yardstick is not very helpful for detecting SQL performance
> > regressions.
> >
> > I think we need a brand-new framework for this task and I propose to
> > implement it by adopting the ideas taken from the Apollo tool paper [1].
> > The Apollo tool pipeline works like like this:
> >
> > 1. Apollo start two different versions of databases simultaneously.
> > 2. Then Apollo populates them with the same dataset
> > 3. Apollo generates random SQL queries using external library (i.e.
> > SQLSmith [2])
> > 4. Each query is executed in both database versions. Execution time is
> > measured by the framework.
> > 5. If the execution time difference for the same query exceeds some
> > threshold (say, 2x slower), the query is logged.
> > 6. Apollo then tries to simplify the problematic queries in order to
> > obtain the minimal reproducer.
> > 7. Apollo also has an ability to automatically perform git history
> > binary search to find the bad commit
> > 8. It also can localize a root cause of the regression by carrying out
> > the statistical debugging.
> >
> > I think we don't have to implement all these Apollo steps. First 4 steps
> > will be enough for our needs.
> >
> > My proposal is to create a new module called 'sql-testing'. We need a
> > separate module because it should be suitable for both query engines:
> > H2-based and upcoming Calcite-based. This module will contain a test
> > suite which works in the following way:
> > 1. It starts two Ignite clusters with different versions (current
> > version and the previous release version).
> > 2. Framework then runs randomly generated queries in both clusters and
> > checks the execution time for each cluster. We need to port SQLSmith [2]
> > library from C++ to java for this step. But initially we can start with
> > some set of hardcoded queries and postpone the SQLSmith port. Randomized
> > queries can be added later.
> > 3. All problematic queries are then reported as performance issues. In
> > this way we can manually examine the problems.
> >
> > This tool will bring a certain amount of robustness to our SQL layer as
> > well as some portion of confidence in absence of SQL query regressions.
> >
> > What do you think?
> >
> >
> > [1] http://www.vldb.org/pvldb/vol13/p57-jung.pdf
> > [2] https://github.com/anse1/sqlsmith
> >
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
>

Re: [DISCUSSION] Framework for SQL performance regressions detection.

Reply via email to