I would think that a TPC-DS benchmark would be more appropriate for the type of queries I'd be interested in working with Calcite. Also as an end result of these efforts I would imagine the community would get better instrumentation of metrics up and down the query processing pipeline. From parsing to optimizing, rewrites, etc.. This would be interesting even as a feature to use in conjunction with the lattice framework to decide what queries to eventually build lattices as an estimation of time savings.
Ruhollah Farchtchi ruhollah.farcht...@gmail.com On Mon, Feb 5, 2018 at 9:26 AM, Michael Mior <mm...@uwaterloo.ca> wrote: > One interesting exercise would also be to pick a popular benchmark (e.g. > TPC-H) and just look at the plan produced by Calcite vs existing RDBMS > optimizers (e.g. Postgres, MySQL). Along with performance analysis of the > various options, it seems there's a paper in there. > > -- > Michael Mior > mm...@apache.org > > 2018-02-03 23:21 GMT-05:00 Edmon Begoli <ebeg...@gmail.com>: > > > I am planning on opening an issue, and coordinating an initiative to > > develop a Calcite-focused benchmark. > > > > This would lead to the development of the executable, reportable > benchmark, > > and of the next publication aimed at another significant computer science > > conference or a journal. > > > > Before I submit a JIRA issue, i would like to get your feedback on what > > this benchmark might be both in terms of what it should benchmark, and > now > > it should be implemented. > > > > Couple of preliminary thoughts that came out of the conversation with the > > co-authors of our SIGMOD paper are: > > > > * Optimizer runtime for complex queries (we could also compare with the > > runtime of executing the optimized query directly) > > * Calcite optimized query > > * Unoptimized query with the optimizer of the backend disabled > > * Unoptimized query with the optimizer of the backend enabled > > * Overhead of going through Calcite adapters vs. natively accessing the > > target DB > > * Comparison with other federated query processing engines such as Spark > > SQL and PrestoDB > > * use TCP-H or DS for this purpose > > * use Star Schema Benchmark (SSB) > > * Planning and execution time with queries that span across multiple > > systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and > Cassandra). > > > > > > > > Follow approaches similar to: > > * https://www.slideshare.net/julianhyde/w-435phyde-3 > > * > > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/ > > bk_hive-performance-tuning/content/ch_cost-based-optimizer.html > > * (How much of this is still relevant (Hive 0.14)? Can we use > > queries/benchmarks?) > > https://hortonworks.com/blog/hive-0-14-cost-based- > optimizer-cbo-technical- > > overview/ > > > > > > Please share your suggestions. > > >