Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

Edmon Begoli Sat, 03 Feb 2018 20:22:10 -0800

I am planning on opening an issue, and coordinating an initiative to
develop a Calcite-focused benchmark.


This would lead to the development of the executable, reportable benchmark,
and of the next publication aimed at another significant computer science
conference or a journal.

Before I submit a JIRA issue, i would like to get your feedback on what
this benchmark might be both in terms of what it should benchmark, and now
it should be implemented.

Couple of preliminary thoughts that came out of the conversation with the
co-authors of our SIGMOD paper are:

* Optimizer runtime for complex queries (we could also compare with the
runtime of executing the optimized query directly)
* Calcite optimized query
* Unoptimized query with the optimizer of the backend disabled
* Unoptimized query with the optimizer of the backend enabled
* Overhead of going through Calcite adapters vs. natively accessing the
target DB
* Comparison with other federated query processing engines such as Spark
SQL and PrestoDB
* use TCP-H or DS for this purpose
* use Star Schema Benchmark (SSB)
* Planning and execution time with queries that span across multiple
systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and Cassandra).



Follow approaches similar to:
* https://www.slideshare.net/julianhyde/w-435phyde-3
*
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
* (How much of this is still relevant (Hive 0.14)? Can we use
queries/benchmarks?)
https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/


Please share your suggestions.

Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

Reply via email to