Hi Julian, Stamatis, Seung-Hwan,
@Julian, Thank you for suggesting JMH for performance analysis. I prepared
a TPC-H benchmark test using JMH and as you already mentioned, the time
for each phases reduced drastically. I was surprised and the same time
happy to see the benchmark results.
@Stamatis, Yes you are correct. The support for adding the custom rules in
Calcite will beat the overall performance of built-in rules-set of any
RDBMS. I can convince my guide with regards to these points .
@Seung-Hwan, Thank you so much for your interest in collaborating with us.
I'll be also very happy to contribute towards the benchmark analysis for
Calcite.
The JMH benchmark test for TPC-H and the results of its comparison with
postgres is also attached to this email. Please find them and I appreciate
your feedback in this regard.
Thanking you
Lekshmi B.G
Email: [email protected]
On Mon, Dec 31, 2018 at 3:53 PM Lim, Seung-Hwan <[email protected]>
wrote:
> Hi Lekshmi,
>
> I am one of the members in Edmon Begoli’s team who did preliminary work on
> comparison Calcite with conventional RDBMS, especially postgresql.
>
> The major challenge that we’ve identified is that many benchmarks (e.g.,
> TPC-H, TPC-DS) evaluates the performance of ‘Join’ operations. For both
> benchmarks, Calcite often generated less optimized plan than RDBMS.
>
> I’ll be very happy to help you or collaborate with you in this regard.
>
> Thank you,
> Seung-Hwan
>
>
> > On Dec 31, 2018, at 8:33 AM, Stamatis Zampetakis <[email protected]>
> wrote:
> >
> > Hi Lekshmi,
> >
> > Thanks for the interesting information. It is good to see more people
> > involved in the benchmark and optimizations on Calcite.
> >
> > However, I am not sure I understand what you are trying to achieve by
> > performing an all-in-all comparison between Calcite and other databases
> (in
> > the particular case with Postgres).
> > Calcite provides you everything you need to build a database but itself
> is
> > not a database. Could you possibly share a bit more information on what
> you
> > are expecting to gain from these kind of experiments.
> >
> > On the other hand, it would be very interesting to compare individual
> parts
> > of Calcite (e.g., optimizer) with the respective ones of Postgres (or
> other
> > database) although this will not be easy.
> > If for instance, you want to compare the optimizers in terms of
> > performance, time may not be a good metric since C code will almost
> always
> > be faster than Java code.
> > Another comparison axe for the optimizer, could be the quality of the
> > produced plans but finding a good metric can be also challenging.
> Measuring
> > the quality of the plan could be based on the execution time of the plan
> on
> > the same engine (all in Calcite or all in Postgres for instance). In
> terms
> > of research, I guess it would be very nice to demonstrate that a Volcano
> > optimizer (Calcite) with a custom rule-set can beat the built-in
> optimizer
> > of Postgres in terms of plan quality; plus it would be very useful for
> many
> > end-users of Calcite to have a rule-set that simulates the optimizer of
> > Postgres (or another database).
> >
> > As a general comment, I think it would be easier to find good use cases
> in
> > favor of Calcite if you emphasize in data integration scenarios,
> > cross-database queries, querying raw data (not in a database) and/or
> > systems without an optimizer.
> >
> > Best,
> > Stamatis
> >
> > Στις Δευ, 31 Δεκ 2018 στις 12:35 μ.μ., ο/η Lekshmi <
> [email protected]>
> > έγραψε:
> >
> >> Hi Julian,
> >>
> >> Thanks for a lot for the prompt response and support. I will try running
> >> the test with JMH and will let you know the feedback.
> >>
> >> I wish you all have a prosperous new year.
> >>
> >> Thanks and Regards
> >>
> >> Lekshmi B.G
> >> Email: [email protected]
> >>
> >>
> >>
> >>
> >> On Mon, Dec 31, 2018 at 10:38 AM Julian Feinauer <
> >> [email protected]> wrote:
> >>
> >>> Hi Lekshmi,
> >>>
> >>> your activity sounds very interesting.
> >>> One important thing to note is that Performance testing in Java is
> always
> >>> tricky due to JIT and "warmup" phase of the JVM. Thus it is generally
> >>> recommended to do these tests with JMH (
> >>> https://openjdk.java.net/projects/code-tools/jmh/).
> >>>
> >>> I would assume that the time for sql2rel reduces drastically (perhaps
> one
> >>> or two orders) when run with JMH.
> >>>
> >>> Best
> >>> Julian
> >>>
> >>> Am 30.12.18, 23:12 schrieb "Lekshmi" <[email protected]>:
> >>>
> >>> Hello Folks,
> >>>
> >>> For my research activities, I was trying to perform a benchmark
> >>> comparison
> >>> between calcite with other database systems. As an initial step, I
> >> was
> >>> trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH
> >>> queries
> >>> were the right thing to start with. I tried running the TpchTest (
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java
> >>> )
> >>> by adding the *CalciteTimingTracer* in the junit tests to determine
> >> the
> >>> execution time. While doing so, I could see that the execution time
> >> in
> >>> calcite is significantly higher compared to postgresSql. On further
> >>> investigation, I could see that we generate the required datas
> >>> required for
> >>> these queries(which comes around 150,000 for some tables) and I was
> >>> under
> >>> an impression that most of the time was spend on the data generation
> >>> and
> >>> that the query execution could be faster. So, I modified the
> relevant
> >>> schema class (
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java
> >>> )
> >>> to perform the data generation and query execution separately. Then,
> >> I
> >>> traced the time took for just query execution. Even, then there was
> a
> >>> significant difference from that of PostgresSql.
> >>>
> >>> I, also enabled the *log4j.rootLogger* to *TRACE * to find the time
> >>> spend
> >>> for sql2rel and optimization phases of the class Prepare
> >>> <
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java
> >>>> .
> >>> And, to my surprise, I could see that calcite takes a time of 355ms
> >> for
> >>> sql2rel and 352ms for optimization for the junit test *testQuery01*.
> >>> On the
> >>> other side, the same query gave a planning time of 0.163ms in
> >> Postgres.
> >>>
> >>> I would like to know, if this is the right way to test the
> >> performance
> >>> of
> >>> TPCH queries using apache calcite. Can anyone let me know if there
> >>> exist
> >>> any better ways to do it.
> >>>
> >>> And, while searching through JIRA, I could find a ticket
> >>> https://issues.apache.org/jira/browse/CALCITE-2169 which was
> created
> >>> by
> >>> Edmon Begoli for performing a comparative performance study of the
> >>> calcite
> >>> framework. I think, its related to my current problem. I have no
> idea
> >>> regarding the status of the ticket. It would be really great if
> >> someone
> >>> could help me with some information on it.
> >>>
> >>> Also, now coming to the personal preference, I would like to
> continue
> >>> my
> >>> research in calcite due to its simplicity and extensibility. But,
> >> if I
> >>> fail to give a good case study in favour of Calcite, I am afraid
> >> that I
> >>> could loose an opportunity to work with calcite.
> >>>
> >>> Thanks and Regards
> >>>
> >>> Lekshmi B.G
> >>> Email: [email protected]
> >>>
> >>>
> >>>
> >>
>
>
/**
*
*/
package org.fau.tpch.benchmark;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.apache.calcite.adapter.tpch.TpchSchema;
import org.apache.calcite.test.CalciteAssert;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import com.google.common.collect.ImmutableList;
/**
* @author lekshmi
*
*/
@Fork(value = 5, jvmArgsPrepend = "-Xmx2048m")
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Threads(1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class TpchBenchmarkTest
{
private static String schema(String name, String scaleFactor)
{
return " {\n"
+ " type: 'custom',\n"
+ " name: '" + name + "',\n"
+ " factory: 'org.apache.calcite.adapter.tpch.TpchSchemaFactory',\n"
+ " operand: {\n"
+ " columnPrefix: true,\n"
+ " scale: " + scaleFactor + "\n"
+ " }\n"
+ " }";
}
public static final String TPCH_MODEL = "{\n"
+ " version: '1.0',\n"
+ " defaultSchema: 'TPCH',\n"
+ " schemas: [\n"
+ schema("TPCH", "1.0") + ",\n"
+ schema("TPCH_01", "0.01") + ",\n"
+ schema("TPCH_5", "5.0") + "\n"
+ " ]\n"
+ "}";
static final List<String> QUERIES = ImmutableList.of(
// 01
"select\n"
+ " l_returnflag,\n"
+ " l_linestatus,\n"
+ " sum(l_quantity) as sum_qty,\n"
+ " sum(l_extendedprice) as sum_base_price,\n"
+ " sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,\n"
+ " sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,\n"
+ " avg(l_quantity) as avg_qty,\n"
+ " avg(l_extendedprice) as avg_price,\n"
+ " avg(l_discount) as avg_disc,\n"
+ " count(*) as count_order\n"
+ "from\n"
+ " tpch.lineitem\n"
+ "-- where\n"
// + "-- l_shipdate <= date '1998-12-01' - interval '120'
// day (3)\n"
+ "-- l_shipdate <= date '1998-12-01' - interval '117' day \n"
+ "group by\n"
+ " l_returnflag,\n"
+ " l_linestatus\n"
+ "\n"
+ "order by\n"
+ " l_returnflag,\n"
+ " l_linestatus\n"
+ "limit 1",
// 02
"select\n"
+ " s.s_acctbal,\n"
+ " s.s_name,\n"
+ " n.n_name,\n"
+ " p.p_partkey,\n"
+ " p.p_mfgr,\n"
+ " s.s_address,\n"
+ " s.s_phone,\n"
+ " s.s_comment\n"
+ "from\n"
+ " tpch.part p,\n"
+ " tpch.supplier s,\n"
+ " tpch.partsupp ps,\n"
+ " tpch.nation n,\n"
+ " tpch.region r\n"
+ "where\n"
+ " p.p_partkey = ps.ps_partkey\n"
+ " and s.s_suppkey = ps.ps_suppkey\n"
+ " and p.p_size = 41\n"
+ " and p.p_type like '%NICKEL'\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_regionkey = r.r_regionkey\n"
+ " and r.r_name = 'EUROPE'\n"
+ " and ps.ps_supplycost = (\n"
+ "\n"
+ " select\n"
+ " min(ps.ps_supplycost)\n"
+ "\n"
+ " from\n"
+ " tpch.partsupp ps,\n"
+ " tpch.supplier s,\n"
+ " tpch.nation n,\n"
+ " tpch.region r\n"
+ " where\n"
+ " p.p_partkey = ps.ps_partkey\n"
+ " and s.s_suppkey = ps.ps_suppkey\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_regionkey = r.r_regionkey\n"
+ " and r.r_name = 'EUROPE'\n"
+ " )\n"
+ "\n"
+ "order by\n"
+ " s.s_acctbal desc,\n"
+ " n.n_name,\n"
+ " s.s_name,\n"
+ " p.p_partkey\n"
+ "limit 100",
// 03
"select\n"
+ " l.l_orderkey,\n"
+ " sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,\n"
+ " o.o_orderdate,\n"
+ " o.o_shippriority\n"
+ "\n"
+ "from\n"
+ " tpch.customer c,\n"
+ " tpch.orders o,\n"
+ " tpch.lineitem l\n"
+ "\n"
+ "where\n"
+ " c.c_mktsegment = 'HOUSEHOLD'\n"
+ " and c.c_custkey = o.o_custkey\n"
+ " and l.l_orderkey = o.o_orderkey\n"
+ "-- and o.o_orderdate < date '1995-03-25'\n"
+ "-- and l.l_shipdate > date '1995-03-25'\n"
+ "\n"
+ "group by\n"
+ " l.l_orderkey,\n"
+ " o.o_orderdate,\n"
+ " o.o_shippriority\n"
+ "order by\n"
+ " revenue desc,\n"
+ " o.o_orderdate\n"
+ "limit 10",
// 04
"select\n"
+ " o_orderpriority,\n"
+ " count(*) as order_count\n"
+ "from\n"
+ " tpch.orders\n"
+ "\n"
+ "where\n"
+ "-- o_orderdate >= date '1996-10-01'\n"
+ "-- and o_orderdate < date '1996-10-01' + interval '3' month\n"
+ "-- and \n"
+ " exists (\n"
+ " select\n"
+ " *\n"
+ " from\n"
+ " tpch.lineitem\n"
+ " where\n"
+ " l_orderkey = o_orderkey\n"
+ " and l_commitdate < l_receiptdate\n"
+ " )\n"
+ "group by\n"
+ " o_orderpriority\n"
+ "order by\n"
+ " o_orderpriority",
// 05
"select\n"
+ " n.n_name,\n"
+ " sum(l.l_extendedprice * (1 - l.l_discount)) as revenue\n"
+ "\n"
+ "from\n"
+ " tpch.customer c,\n"
+ " tpch.orders o,\n"
+ " tpch.lineitem l,\n"
+ " tpch.supplier s,\n"
+ " tpch.nation n,\n"
+ " tpch.region r\n"
+ "\n"
+ "where\n"
+ " c.c_custkey = o.o_custkey\n"
+ " and l.l_orderkey = o.o_orderkey\n"
+ " and l.l_suppkey = s.s_suppkey\n"
+ " and c.c_nationkey = s.s_nationkey\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_regionkey = r.r_regionkey\n"
+ " and r.r_name = 'EUROPE'\n"
+ "-- and o.o_orderdate >= date '1997-01-01'\n"
+ "-- and o.o_orderdate < date '1997-01-01' + interval '1' year\n"
+ "group by\n"
+ " n.n_name\n"
+ "\n"
+ "order by\n"
+ " revenue desc",
// 06
"select\n"
+ " sum(l_extendedprice * l_discount) as revenue\n"
+ "from\n"
+ " tpch.lineitem\n"
+ "where\n"
+ "-- l_shipdate >= date '1997-01-01'\n"
+ "-- and l_shipdate < date '1997-01-01' + interval '1' year\n"
+ "-- and\n"
+ " l_discount between 0.03 - 0.01 and 0.03 + 0.01\n"
+ " and l_quantity < 24",
// 07
"select\n"
+ " supp_nation,\n"
+ " cust_nation,\n"
+ " l_year,\n"
+ " sum(volume) as revenue\n"
+ "from\n"
+ " (\n"
+ " select\n"
+ " n1.n_name as supp_nation,\n"
+ " n2.n_name as cust_nation,\n"
+ " extract(year from l.l_shipdate) as l_year,\n"
+ " l.l_extendedprice * (1 - l.l_discount) as volume\n"
+ " from\n"
+ " tpch.supplier s,\n"
+ " tpch.lineitem l,\n"
+ " tpch.orders o,\n"
+ " tpch.customer c,\n"
+ " tpch.nation n1,\n"
+ " tpch.nation n2\n"
+ " where\n"
+ " s.s_suppkey = l.l_suppkey\n"
+ " and o.o_orderkey = l.l_orderkey\n"
+ " and c.c_custkey = o.o_custkey\n"
+ " and s.s_nationkey = n1.n_nationkey\n"
+ " and c.c_nationkey = n2.n_nationkey\n"
+ " and (\n"
+ " (n1.n_name = 'EGYPT' and n2.n_name = 'UNITED STATES')\n"
+ " or (n1.n_name = 'UNITED STATES' and n2.n_name = 'EGYPT')\n"
+ " )\n"
+ "-- and l.l_shipdate between date '1995-01-01' and date '1996-12-31'\n"
+ " ) as shipping\n"
+ "group by\n"
+ " supp_nation,\n"
+ " cust_nation,\n"
+ " l_year\n"
+ "order by\n"
+ " supp_nation,\n"
+ " cust_nation,\n"
+ " l_year",
// 08
"select\n"
+ " o_year,\n"
+ " sum(case\n"
+ " when nation = 'EGYPT' then volume\n"
+ " else 0\n"
+ " end) / sum(volume) as mkt_share\n"
+ "from\n"
+ " (\n"
+ " select\n"
+ " extract(year from o.o_orderdate) as o_year,\n"
+ " l.l_extendedprice * (1 - l.l_discount) as volume,\n"
+ " n2.n_name as nation\n"
+ " from\n"
+ " tpch.part p,\n"
+ " tpch.supplier s,\n"
+ " tpch.lineitem l,\n"
+ " tpch.orders o,\n"
+ " tpch.customer c,\n"
+ " tpch.nation n1,\n"
+ " tpch.nation n2,\n"
+ " tpch.region r\n"
+ " where\n"
+ " p.p_partkey = l.l_partkey\n"
+ " and s.s_suppkey = l.l_suppkey\n"
+ " and l.l_orderkey = o.o_orderkey\n"
+ " and o.o_custkey = c.c_custkey\n"
+ " and c.c_nationkey = n1.n_nationkey\n"
+ " and n1.n_regionkey = r.r_regionkey\n"
+ " and r.r_name = 'MIDDLE EAST'\n"
+ " and s.s_nationkey = n2.n_nationkey\n"
+ " and o.o_orderdate between date '1995-01-01' and date '1996-12-31'\n"
+ " and p.p_type = 'PROMO BRUSHED COPPER'\n"
+ " ) as all_nations\n"
+ "group by\n"
+ " o_year\n"
+ "order by\n"
+ " o_year",
// 09
"select\n"
+ " nation,\n"
+ " o_year,\n"
+ " sum(amount) as sum_profit\n"
+ "from\n"
+ " (\n"
+ " select\n"
+ " n_name as nation,\n"
+ " extract(year from o_orderdate) as o_year,\n"
+ " l.l_extendedprice * (1 - l.l_discount) - ps.ps_supplycost * l.l_quantity as amount\n"
+ " from\n"
+ " tpch.part p,\n"
+ " tpch.supplier s,\n"
+ " tpch.lineitem l,\n"
+ " tpch.partsupp ps,\n"
+ " tpch.orders o,\n"
+ " tpch.nation n\n"
+ " where\n"
+ " s.s_suppkey = l.l_suppkey\n"
+ " and ps.ps_suppkey = l.l_suppkey\n"
+ " and ps.ps_partkey = l.l_partkey\n"
+ " and p.p_partkey = l.l_partkey\n"
+ " and o.o_orderkey = l.l_orderkey\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and p.p_name like '%yellow%'\n"
+ " ) as profit\n"
+ "group by\n"
+ " nation,\n"
+ " o_year\n"
+ "order by\n"
+ " nation,\n"
+ " o_year desc",
// 10
"select\n"
+ " c.c_custkey,\n"
+ " c.c_name,\n"
+ " sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,\n"
+ " c.c_acctbal,\n"
+ " n.n_name,\n"
+ " c.c_address,\n"
+ " c.c_phone,\n"
+ " c.c_comment\n"
+ "from\n"
+ " tpch.customer c,\n"
+ " tpch.orders o,\n"
+ " tpch.lineitem l,\n"
+ " tpch.nation n\n"
+ "where\n"
+ " c.c_custkey = o.o_custkey\n"
+ " and l.l_orderkey = o.o_orderkey\n"
+ " and o.o_orderdate >= date '1994-03-01'\n"
+ " and o.o_orderdate < date '1994-03-01' + interval '3' month\n"
+ " and l.l_returnflag = 'R'\n"
+ " and c.c_nationkey = n.n_nationkey\n"
+ "group by\n"
+ " c.c_custkey,\n"
+ " c.c_name,\n"
+ " c.c_acctbal,\n"
+ " c.c_phone,\n"
+ " n.n_name,\n"
+ " c.c_address,\n"
+ " c.c_comment\n"
+ "order by\n"
+ " revenue desc\n"
+ "limit 20",
// 11
"select\n"
+ " ps.ps_partkey,\n"
+ " sum(ps.ps_supplycost * ps.ps_availqty) as \"value\"\n"
+ "from\n"
+ " tpch.partsupp ps,\n"
+ " tpch.supplier s,\n"
+ " tpch.nation n\n"
+ "where\n"
+ " ps.ps_suppkey = s.s_suppkey\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_name = 'JAPAN'\n"
+ "group by\n"
+ " ps.ps_partkey having\n"
+ " sum(ps.ps_supplycost * ps.ps_availqty) > (\n"
+ " select\n"
+ " sum(ps.ps_supplycost * ps.ps_availqty) * 0.0001000000\n"
+ " from\n"
+ " tpch.partsupp ps,\n"
+ " tpch.supplier s,\n"
+ " tpch.nation n\n"
+ " where\n"
+ " ps.ps_suppkey = s.s_suppkey\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_name = 'JAPAN'\n"
+ " )\n"
+ "order by\n"
+ " \"value\" desc",
// 12
"select\n"
+ " l.l_shipmode,\n"
+ " sum(case\n"
+ " when o.o_orderpriority = '1-URGENT'\n"
+ " or o.o_orderpriority = '2-HIGH'\n"
+ " then 1\n"
+ " else 0\n"
+ " end) as high_line_count,\n"
+ " sum(case\n"
+ " when o.o_orderpriority <> '1-URGENT'\n"
+ " and o.o_orderpriority <> '2-HIGH'\n"
+ " then 1\n"
+ " else 0\n"
+ " end) as low_line_count\n"
+ "from\n"
+ " tpch.orders o,\n"
+ " tpch.lineitem l\n"
+ "where\n"
+ " o.o_orderkey = l.l_orderkey\n"
+ " and l.l_shipmode in ('TRUCK', 'REG AIR')\n"
+ " and l.l_commitdate < l.l_receiptdate\n"
+ " and l.l_shipdate < l.l_commitdate\n"
+ "-- and l.l_receiptdate >= date '1994-01-01'\n"
+ "-- and l.l_receiptdate < date '1994-01-01' + interval '1' year\n"
+ "group by\n"
+ " l.l_shipmode\n"
+ "order by\n"
+ " l.l_shipmode",
// 13
"select\n"
+ " c_count,\n"
+ " count(*) as custdist\n"
+ "from\n"
+ " (\n"
+ " select\n"
+ " c.c_custkey,\n"
+ " count(o.o_orderkey)\n"
+ " from\n"
+ " tpch.customer c \n"
+ " left outer join tpch.orders o \n"
+ " on c.c_custkey = o.o_custkey\n"
+ " and o.o_comment not like '%special%requests%'\n"
+ " group by\n"
+ " c.c_custkey\n"
+ " ) as orders (c_custkey, c_count)\n"
+ "group by\n"
+ " c_count\n"
+ "order by\n"
+ " custdist desc,\n"
+ " c_count desc",
// 14
"select\n"
+ " 100.00 * sum(case\n"
+ " when p.p_type like 'PROMO%'\n"
+ " then l.l_extendedprice * (1 - l.l_discount)\n"
+ " else 0\n"
+ " end) / sum(l.l_extendedprice * (1 - l.l_discount)) as promo_revenue\n"
+ "from\n"
+ " tpch.lineitem l,\n"
+ " tpch.part p\n"
+ "where\n"
+ " l.l_partkey = p.p_partkey\n"
+ " and l.l_shipdate >= date '1994-08-01'\n"
+ " and l.l_shipdate < date '1994-08-01' + interval '1' month",
// 15
"with revenue0 (supplier_no, total_revenue) as (\n"
+ " select\n"
+ " l_suppkey,\n"
+ " sum(l_extendedprice * (1 - l_discount))\n"
+ " from\n"
+ " tpch.lineitem\n"
+ " where\n"
+ " l_shipdate >= date '1993-05-01'\n"
+ " and l_shipdate < date '1993-05-01' + interval '3' month\n"
+ " group by\n"
+ " l_suppkey)\n"
+ "select\n"
+ " s.s_suppkey,\n"
+ " s.s_name,\n"
+ " s.s_address,\n"
+ " s.s_phone,\n"
+ " r.total_revenue\n"
+ "from\n"
+ " tpch.supplier s,\n"
+ " revenue0 r\n"
+ "where\n"
+ " s.s_suppkey = r.supplier_no\n"
+ " and r.total_revenue = (\n"
+ " select\n"
+ " max(total_revenue)\n"
+ " from\n"
+ " revenue0\n"
+ " )\n"
+ "order by\n"
+ " s.s_suppkey",
// 16
"select\n"
+ " p.p_brand,\n"
+ " p.p_type,\n"
+ " p.p_size,\n"
+ " count(distinct ps.ps_suppkey) as supplier_cnt\n"
+ "from\n"
+ " tpch.partsupp ps,\n"
+ " tpch.part p\n"
+ "where\n"
+ " p.p_partkey = ps.ps_partkey\n"
+ " and p.p_brand <> 'Brand#21'\n"
+ " and p.p_type not like 'MEDIUM PLATED%'\n"
+ " and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24)\n"
+ " and ps.ps_suppkey not in (\n"
+ " select\n"
+ " s_suppkey\n"
+ " from\n"
+ " tpch.supplier\n"
+ " where\n"
+ " s_comment like '%Customer%Complaints%'\n"
+ " )\n"
+ "group by\n"
+ " p.p_brand,\n"
+ " p.p_type,\n"
+ " p.p_size\n"
+ "order by\n"
+ " supplier_cnt desc,\n"
+ " p.p_brand,\n"
+ " p.p_type,\n"
+ " p.p_size",
// 17
"select\n"
+ " sum(l.l_extendedprice) / 7.0 as avg_yearly\n"
+ "from\n"
+ " tpch.lineitem l,\n"
+ " tpch.part p\n"
+ "where\n"
+ " p.p_partkey = l.l_partkey\n"
+ " and p.p_brand = 'Brand#13'\n"
+ " and p.p_container = 'JUMBO CAN'\n"
+ " and l.l_quantity < (\n"
+ " select\n"
+ " 0.2 * avg(l2.l_quantity)\n"
+ " from\n"
+ " tpch.lineitem l2\n"
+ " where\n"
+ " l2.l_partkey = p.p_partkey\n"
+ " )",
// 18
"select\n"
+ " c.c_name,\n"
+ " c.c_custkey,\n"
+ " o.o_orderkey,\n"
+ " o.o_orderdate,\n"
+ " o.o_totalprice,\n"
+ " sum(l.l_quantity)\n"
+ "from\n"
+ " tpch.customer c,\n"
+ " tpch.orders o,\n"
+ " tpch.lineitem l\n"
+ "where\n"
+ " o.o_orderkey in (\n"
+ " select\n"
+ " l_orderkey\n"
+ " from\n"
+ " tpch.lineitem\n"
+ " group by\n"
+ " l_orderkey having\n"
+ " sum(l_quantity) > 313\n"
+ " )\n"
+ " and c.c_custkey = o.o_custkey\n"
+ " and o.o_orderkey = l.l_orderkey\n"
+ "group by\n"
+ " c.c_name,\n"
+ " c.c_custkey,\n"
+ " o.o_orderkey,\n"
+ " o.o_orderdate,\n"
+ " o.o_totalprice\n"
+ "order by\n"
+ " o.o_totalprice desc,\n"
+ " o.o_orderdate\n"
+ "limit 100",
// 19
"select\n"
+ " sum(l.l_extendedprice* (1 - l.l_discount)) as revenue\n"
+ "from\n"
+ " tpch.lineitem l,\n"
+ " tpch.part p\n"
+ "where\n"
+ " (\n"
+ " p.p_partkey = l.l_partkey\n"
+ " and p.p_brand = 'Brand#41'\n"
+ " and p.p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')\n"
+ " and l.l_quantity >= 2 and l.l_quantity <= 2 + 10\n"
+ " and p.p_size between 1 and 5\n"
+ " and l.l_shipmode in ('AIR', 'AIR REG')\n"
+ " and l.l_shipinstruct = 'DELIVER IN PERSON'\n"
+ " )\n"
+ " or\n"
+ " (\n"
+ " p.p_partkey = l.l_partkey\n"
+ " and p.p_brand = 'Brand#13'\n"
+ " and p.p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')\n"
+ " and l.l_quantity >= 14 and l.l_quantity <= 14 + 10\n"
+ " and p.p_size between 1 and 10\n"
+ " and l.l_shipmode in ('AIR', 'AIR REG')\n"
+ " and l.l_shipinstruct = 'DELIVER IN PERSON'\n"
+ " )\n"
+ " or\n"
+ " (\n"
+ " p.p_partkey = l.l_partkey\n"
+ " and p.p_brand = 'Brand#55'\n"
+ " and p.p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')\n"
+ " and l.l_quantity >= 23 and l.l_quantity <= 23 + 10\n"
+ " and p.p_size between 1 and 15\n"
+ " and l.l_shipmode in ('AIR', 'AIR REG')\n"
+ " and l.l_shipinstruct = 'DELIVER IN PERSON'\n"
+ " )",
// 20
"select\n"
+ " s.s_name,\n"
+ " s.s_address\n"
+ "from\n"
+ " tpch.supplier s,\n"
+ " tpch.nation n\n"
+ "where\n"
+ " s.s_suppkey in (\n"
+ " select\n"
+ " ps.ps_suppkey\n"
+ " from\n"
+ " tpch.partsupp ps\n"
+ " where\n"
+ " ps. ps_partkey in (\n"
+ " select\n"
+ " p.p_partkey\n"
+ " from\n"
+ " tpch.part p\n"
+ " where\n"
+ " p.p_name like 'antique%'\n"
+ " )\n"
+ " and ps.ps_availqty > (\n"
+ " select\n"
+ " 0.5 * sum(l.l_quantity)\n"
+ " from\n"
+ " tpch.lineitem l\n"
+ " where\n"
+ " l.l_partkey = ps.ps_partkey\n"
+ " and l.l_suppkey = ps.ps_suppkey\n"
+ " and l.l_shipdate >= date '1993-01-01'\n"
+ " and l.l_shipdate < date '1993-01-01' + interval '1' year\n"
+ " )\n"
+ " )\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_name = 'KENYA'\n"
+ "order by\n"
+ " s.s_name",
// 21
"select\n"
+ " s.s_name,\n"
+ " count(*) as numwait\n"
+ "from\n"
+ " tpch.supplier s,\n"
+ " tpch.lineitem l1,\n"
+ " tpch.orders o,\n"
+ " tpch.nation n\n"
+ "where\n"
+ " s.s_suppkey = l1.l_suppkey\n"
+ " and o.o_orderkey = l1.l_orderkey\n"
+ " and o.o_orderstatus = 'F'\n"
+ " and l1.l_receiptdate > l1.l_commitdate\n"
+ " and exists (\n"
+ " select\n"
+ " *\n"
+ " from\n"
+ " tpch.lineitem l2\n"
+ " where\n"
+ " l2.l_orderkey = l1.l_orderkey\n"
+ " and l2.l_suppkey <> l1.l_suppkey\n"
+ " )\n"
+ " and not exists (\n"
+ " select\n"
+ " *\n"
+ " from\n"
+ " tpch.lineitem l3\n"
+ " where\n"
+ " l3.l_orderkey = l1.l_orderkey\n"
+ " and l3.l_suppkey <> l1.l_suppkey\n"
+ " and l3.l_receiptdate > l3.l_commitdate\n"
+ " )\n"
+ " and s.s_nationkey = n.n_nationkey\n"
+ " and n.n_name = 'BRAZIL'\n"
+ "group by\n"
+ " s.s_name\n"
+ "order by\n"
+ " numwait desc,\n"
+ " s.s_name\n"
+ "limit 100",
// 22
"select\n"
+ " cntrycode,\n"
+ " count(*) as numcust,\n"
+ " sum(c_acctbal) as totacctbal\n"
+ "from\n"
+ " (\n"
+ " select\n"
+ " substring(c_phone from 1 for 2) as cntrycode,\n"
+ " c_acctbal\n"
+ " from\n"
+ " tpch.customer c\n"
+ " where\n"
+ " substring(c_phone from 1 for 2) in\n"
+ " ('24', '31', '11', '16', '21', '20', '34')\n"
+ " and c_acctbal > (\n"
+ " select\n"
+ " avg(c_acctbal)\n"
+ " from\n"
+ " tpch.customer\n"
+ " where\n"
+ " c_acctbal > 0.00\n"
+ " and substring(c_phone from 1 for 2) in\n"
+ " ('24', '31', '11', '16', '21', '20', '34')\n"
+ " )\n"
+ " and not exists (\n"
+ " select\n"
+ " *\n"
+ " from\n"
+ " tpch.orders o\n"
+ " where\n"
+ " o.o_custkey = c.c_custkey\n"
+ " )\n"
+ " ) as custsale\n"
+ "group by\n"
+ " cntrycode\n"
+ "order by\n"
+ " cntrycode");
private CalciteAssert.AssertThat with()
{
return CalciteAssert.model(TPCH_MODEL)
.enable(true);
}
private void runQuery(int i)
{
with().query(QUERIES.get(i - 1).replaceAll("tpch\\.", "tpch_01."))
.runs();
}
@Setup
public void setUp()
{
// Data generation
TpchSchema.generateTpchData(1D, 1, 1, true);
}
@Benchmark
public void benchmarkQuery01()
{
runQuery(1);
}
@Benchmark
public void benchmarkQuery02()
{
runQuery(2);
}
@Benchmark
public void benchmarkQuery03()
{
runQuery(3);
}
@Benchmark
public void benchmarkQuery04()
{
runQuery(4);
}
@Benchmark
public void benchmarkQuery05()
{
runQuery(5);
}
@Benchmark
public void benchmarkQuery06()
{
runQuery(6);
}
@Benchmark
public void benchmarkQuery07()
{
runQuery(7);
}
@Benchmark
public void benchmarkQuery08()
{
runQuery(8);
}
@Benchmark
public void benchmarkQuery09()
{
runQuery(9);
}
@Benchmark
public void benchmarkQuery10()
{
runQuery(10);
}
@Benchmark
public void benchmarkQuery11()
{
runQuery(11);
}
@Benchmark
public void benchmarkQuery12()
{
runQuery(12);
}
@Benchmark
public void benchmarkQuery13()
{
runQuery(13);
}
@Benchmark
public void benchmarkQuery14()
{
runQuery(14);
}
@Benchmark
public void benchmarkQuery15()
{
runQuery(15);
}
@Benchmark
public void benchmarkQuery16()
{
runQuery(16);
}
@Benchmark
public void benchmarkQuery17()
{
runQuery(17);
}
@Benchmark
public void benchmarkQuery18()
{
runQuery(18);
}
@Benchmark
public void benchmarkQuery19()
{
runQuery(19);
}
@Benchmark
public void benchmarkQuery20()
{
runQuery(20);
}
@Benchmark
public void benchmarkQuery21()
{
runQuery(21);
}
@Benchmark
public void benchmarkQuery22()
{
runQuery(22);
}
}