Re: Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

Edmon Begoli Mon, 05 Feb 2018 17:22:59 -0800

I think that "plus" is a good starting point for a general benchmark, and
then "ubenchmark" maybe for fine-grained profiling of the sub-components
such as planner, etc..


On Mon, Feb 5, 2018 at 8:08 PM, Julian Hyde <jh...@apache.org> wrote:

> Note that Calcite has a “plus” module which is a place to add other data
> sets (e.g. TPC-H, TPD-DS) and tests and benchmarks based on them. Also the
> “ubenchmark” module for micro-benchmarks. I don’t know whether the work you
> are planning would be a natural fit within these modules.
>
> > On Feb 5, 2018, at 4:38 PM, Edmon Begoli <ebeg...@gmail.com> wrote:
> >
> > I am going to create two JIRA issues:
> >
> > 1. Development of the benchmark for Calcite.
> >
> > 2. An R&D development focused on benchmarking, performance evaluation,
> and
> > a study.
> >
> > Thank you,
> > Edmon
> >
> > On Mon, Feb 5, 2018 at 9:26 AM, Michael Mior <mm...@uwaterloo.ca> wrote:
> >
> >> One interesting exercise would also be to pick a popular benchmark (e.g.
> >> TPC-H) and just look at the plan produced by Calcite vs existing RDBMS
> >> optimizers (e.g. Postgres, MySQL). Along with performance analysis of
> the
> >> various options, it seems there's a paper in there.
> >>
> >> --
> >> Michael Mior
> >> mm...@apache.org
> >>
> >> 2018-02-03 23:21 GMT-05:00 Edmon Begoli <ebeg...@gmail.com>:
> >>
> >>> I am planning on opening an issue, and coordinating an initiative to
> >>> develop a Calcite-focused benchmark.
> >>>
> >>> This would lead to the development of the executable, reportable
> >> benchmark,
> >>> and of the next publication aimed at another significant computer
> science
> >>> conference or a journal.
> >>>
> >>> Before I submit a JIRA issue, i would like to get your feedback on what
> >>> this benchmark might be both in terms of what it should benchmark, and
> >> now
> >>> it should be implemented.
> >>>
> >>> Couple of preliminary thoughts that came out of the conversation with
> the
> >>> co-authors of our SIGMOD paper are:
> >>>
> >>> * Optimizer runtime for complex queries (we could also compare with the
> >>> runtime of executing the optimized query directly)
> >>> * Calcite optimized query
> >>> * Unoptimized query with the optimizer of the backend disabled
> >>> * Unoptimized query with the optimizer of the backend enabled
> >>> * Overhead of going through Calcite adapters vs. natively accessing the
> >>> target DB
> >>> * Comparison with other federated query processing engines such as
> Spark
> >>> SQL and PrestoDB
> >>> * use TCP-H or DS for this purpose
> >>> * use Star Schema Benchmark (SSB)
> >>> * Planning and execution time with queries that span across multiple
> >>> systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and
> >> Cassandra).
> >>>
> >>>
> >>>
> >>> Follow approaches similar to:
> >>> * https://www.slideshare.net/julianhyde/w-435phyde-3
> >>> *
> >>> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/
> >>> bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
> >>> * (How much of this is still relevant (Hive 0.14)? Can we use
> >>> queries/benchmarks?)
> >>> https://hortonworks.com/blog/hive-0-14-cost-based-
> >> optimizer-cbo-technical-
> >>> overview/
> >>>
> >>>
> >>> Please share your suggestions.
> >>>
> >>
>
>

Re: Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

Reply via email to