I would think that a TPC-DS benchmark would be more appropriate for the
type of queries I'd be interested in working with Calcite. Also as an end
result of these efforts I would imagine the community would get better
instrumentation of metrics up and down the query processing pipeline. From
parsing to optimizing, rewrites, etc.. This would be interesting even as a
feature to use in conjunction with the lattice framework to decide what
queries to eventually build lattices as an estimation of time savings.

Ruhollah Farchtchi
ruhollah.farcht...@gmail.com

On Mon, Feb 5, 2018 at 9:26 AM, Michael Mior <mm...@uwaterloo.ca> wrote:

> One interesting exercise would also be to pick a popular benchmark (e.g.
> TPC-H) and just look at the plan produced by Calcite vs existing RDBMS
> optimizers (e.g. Postgres, MySQL). Along with performance analysis of the
> various options, it seems there's a paper in there.
>
> --
> Michael Mior
> mm...@apache.org
>
> 2018-02-03 23:21 GMT-05:00 Edmon Begoli <ebeg...@gmail.com>:
>
> > I am planning on opening an issue, and coordinating an initiative to
> > develop a Calcite-focused benchmark.
> >
> > This would lead to the development of the executable, reportable
> benchmark,
> > and of the next publication aimed at another significant computer science
> > conference or a journal.
> >
> > Before I submit a JIRA issue, i would like to get your feedback on what
> > this benchmark might be both in terms of what it should benchmark, and
> now
> > it should be implemented.
> >
> > Couple of preliminary thoughts that came out of the conversation with the
> > co-authors of our SIGMOD paper are:
> >
> > * Optimizer runtime for complex queries (we could also compare with the
> > runtime of executing the optimized query directly)
> > * Calcite optimized query
> > * Unoptimized query with the optimizer of the backend disabled
> > * Unoptimized query with the optimizer of the backend enabled
> > * Overhead of going through Calcite adapters vs. natively accessing the
> > target DB
> > * Comparison with other federated query processing engines such as Spark
> > SQL and PrestoDB
> > * use TCP-H or DS for this purpose
> > * use Star Schema Benchmark (SSB)
> > * Planning and execution time with queries that span across multiple
> > systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and
> Cassandra).
> >
> >
> >
> > Follow approaches similar to:
> > * https://www.slideshare.net/julianhyde/w-435phyde-3
> > *
> > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/
> > bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
> > * (How much of this is still relevant (Hive 0.14)? Can we use
> > queries/benchmarks?)
> > https://hortonworks.com/blog/hive-0-14-cost-based-
> optimizer-cbo-technical-
> > overview/
> >
> >
> > Please share your suggestions.
> >
>

Reply via email to