Edmon Begoli created CALCITE-2169:
-------------------------------------
Summary: Conduct a comparative performance study of the framework
Key: CALCITE-2169
URL: https://issues.apache.org/jira/browse/CALCITE-2169
Project: Calcite
Issue Type: Task
Components: core
Environment: Use Calcite Benchmark, and run it on the Benchmark
environment.
Reporter: Edmon Begoli
Assignee: Edmon Begoli
Design and implement a study of the Calcite framework using benchmark that is
to be developed for CALCITE-2168 (Implement a General Purpose Benchmark for
Calcite), and run a comparative analysis of the performance of the Calcite
optimizer, and the performance of the queries under Calcite optimized and
un-optimized, and in comparison to standalone databases, or other frameworks.
Some ideas and targets for the study:
* Planning and execution time with queries that span across multiple systems
(e.g. Postgres and Cassandra, Postgres and Pig, Pig and Cassandra).
* for TCP-DS, study the plan produced by Calcite vs. existing RDBMS optimizers
(e.g. Postgres, MySQL). This would be interesting even as a
feature to use in conjunction with the lattice framework to decide what queries
to eventually build lattices as an estimation of time savings.
* Optimizer runtime for complex queries (we could also compare with the runtime
of executing the optimized query directly)
* Calcite optimized query
* Unoptimized query with the optimizer of the backend disabled
* Unoptimized query with the optimizer of the backend enabled
* Comparison with other federated query processing engines such as Spark SQL,
PrestoDB, and maybe KSQL[1] and InfluxDB
* Uses Calcite to optimize Spark queries [2]
[1] https://github.com/confluentinc/ksql
[2]
https://www.datascience.com/blog/grunion-data-science-tools-query-optimizer-apache-spark
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)