Re: Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

Julian Hyde Mon, 05 Feb 2018 17:08:48 -0800

Note that Calcite has a “plus” module which is a place to add other data sets 
(e.g. TPC-H, TPD-DS) and tests and benchmarks based on them. Also the 
“ubenchmark” module for micro-benchmarks. I don’t know whether the work you are 
planning would be a natural fit within these modules.


> On Feb 5, 2018, at 4:38 PM, Edmon Begoli <[email protected]> wrote:
> 
> I am going to create two JIRA issues:
> 
> 1. Development of the benchmark for Calcite.
> 
> 2. An R&D development focused on benchmarking, performance evaluation, and
> a study.
> 
> Thank you,
> Edmon
> 
> On Mon, Feb 5, 2018 at 9:26 AM, Michael Mior <[email protected]> wrote:
> 
>> One interesting exercise would also be to pick a popular benchmark (e.g.
>> TPC-H) and just look at the plan produced by Calcite vs existing RDBMS
>> optimizers (e.g. Postgres, MySQL). Along with performance analysis of the
>> various options, it seems there's a paper in there.
>> 
>> --
>> Michael Mior
>> [email protected]
>> 
>> 2018-02-03 23:21 GMT-05:00 Edmon Begoli <[email protected]>:
>> 
>>> I am planning on opening an issue, and coordinating an initiative to
>>> develop a Calcite-focused benchmark.
>>> 
>>> This would lead to the development of the executable, reportable
>> benchmark,
>>> and of the next publication aimed at another significant computer science
>>> conference or a journal.
>>> 
>>> Before I submit a JIRA issue, i would like to get your feedback on what
>>> this benchmark might be both in terms of what it should benchmark, and
>> now
>>> it should be implemented.
>>> 
>>> Couple of preliminary thoughts that came out of the conversation with the
>>> co-authors of our SIGMOD paper are:
>>> 
>>> * Optimizer runtime for complex queries (we could also compare with the
>>> runtime of executing the optimized query directly)
>>> * Calcite optimized query
>>> * Unoptimized query with the optimizer of the backend disabled
>>> * Unoptimized query with the optimizer of the backend enabled
>>> * Overhead of going through Calcite adapters vs. natively accessing the
>>> target DB
>>> * Comparison with other federated query processing engines such as Spark
>>> SQL and PrestoDB
>>> * use TCP-H or DS for this purpose
>>> * use Star Schema Benchmark (SSB)
>>> * Planning and execution time with queries that span across multiple
>>> systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and
>> Cassandra).
>>> 
>>> 
>>> 
>>> Follow approaches similar to:
>>> * https://www.slideshare.net/julianhyde/w-435phyde-3
>>> *
>>> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/
>>> bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
>>> * (How much of this is still relevant (Hive 0.14)? Can we use
>>> queries/benchmarks?)
>>> https://hortonworks.com/blog/hive-0-14-cost-based-
>> optimizer-cbo-technical-
>>> overview/
>>> 
>>> 
>>> Please share your suggestions.
>>> 
>>

Re: Benchmarking Calcite - starting the conversation on the targets and design of the benchmark

Reply via email to