I think that "plus" is a good starting point for a general benchmark, and then "ubenchmark" maybe for fine-grained profiling of the sub-components such as planner, etc..
On Mon, Feb 5, 2018 at 8:08 PM, Julian Hyde <jh...@apache.org> wrote: > Note that Calcite has a “plus” module which is a place to add other data > sets (e.g. TPC-H, TPD-DS) and tests and benchmarks based on them. Also the > “ubenchmark” module for micro-benchmarks. I don’t know whether the work you > are planning would be a natural fit within these modules. > > > On Feb 5, 2018, at 4:38 PM, Edmon Begoli <ebeg...@gmail.com> wrote: > > > > I am going to create two JIRA issues: > > > > 1. Development of the benchmark for Calcite. > > > > 2. An R&D development focused on benchmarking, performance evaluation, > and > > a study. > > > > Thank you, > > Edmon > > > > On Mon, Feb 5, 2018 at 9:26 AM, Michael Mior <mm...@uwaterloo.ca> wrote: > > > >> One interesting exercise would also be to pick a popular benchmark (e.g. > >> TPC-H) and just look at the plan produced by Calcite vs existing RDBMS > >> optimizers (e.g. Postgres, MySQL). Along with performance analysis of > the > >> various options, it seems there's a paper in there. > >> > >> -- > >> Michael Mior > >> mm...@apache.org > >> > >> 2018-02-03 23:21 GMT-05:00 Edmon Begoli <ebeg...@gmail.com>: > >> > >>> I am planning on opening an issue, and coordinating an initiative to > >>> develop a Calcite-focused benchmark. > >>> > >>> This would lead to the development of the executable, reportable > >> benchmark, > >>> and of the next publication aimed at another significant computer > science > >>> conference or a journal. > >>> > >>> Before I submit a JIRA issue, i would like to get your feedback on what > >>> this benchmark might be both in terms of what it should benchmark, and > >> now > >>> it should be implemented. > >>> > >>> Couple of preliminary thoughts that came out of the conversation with > the > >>> co-authors of our SIGMOD paper are: > >>> > >>> * Optimizer runtime for complex queries (we could also compare with the > >>> runtime of executing the optimized query directly) > >>> * Calcite optimized query > >>> * Unoptimized query with the optimizer of the backend disabled > >>> * Unoptimized query with the optimizer of the backend enabled > >>> * Overhead of going through Calcite adapters vs. natively accessing the > >>> target DB > >>> * Comparison with other federated query processing engines such as > Spark > >>> SQL and PrestoDB > >>> * use TCP-H or DS for this purpose > >>> * use Star Schema Benchmark (SSB) > >>> * Planning and execution time with queries that span across multiple > >>> systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and > >> Cassandra). > >>> > >>> > >>> > >>> Follow approaches similar to: > >>> * https://www.slideshare.net/julianhyde/w-435phyde-3 > >>> * > >>> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/ > >>> bk_hive-performance-tuning/content/ch_cost-based-optimizer.html > >>> * (How much of this is still relevant (Hive 0.14)? Can we use > >>> queries/benchmarks?) > >>> https://hortonworks.com/blog/hive-0-14-cost-based- > >> optimizer-cbo-technical- > >>> overview/ > >>> > >>> > >>> Please share your suggestions. > >>> > >> > >