This is an automated email from the ASF dual-hosted git repository.
jiafengzheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 71e80e8957 [typo](docs)Performance test documentation update (#14147)
71e80e8957 is described below
commit 71e80e8957855095e291bd92bd29d6dca1bd2d89
Author: jiafeng.zhang <[email protected]>
AuthorDate: Sun Nov 20 09:40:57 2022 +0800
[typo](docs)Performance test documentation update (#14147)
* Performance test documentation update
---
docs/en/docs/benchmark/ssb.md | 390 ++++++++++++----
docs/en/docs/benchmark/tpch.md | 750 +++++++++++++++++++++++++++++--
docs/images/image-20220614114351241.png | Bin 92508 -> 0 bytes
docs/images/ssb.png | Bin 0 -> 84078 bytes
docs/images/ssb_flat.png | Bin 0 -> 83556 bytes
docs/images/ssb_v11_v015_compare.png | Bin 86762 -> 0 bytes
docs/images/tpch.png | Bin 0 -> 81771 bytes
docs/sidebars.json | 8 +
docs/zh-CN/docs/benchmark/ssb.md | 409 +++++++++++++----
docs/zh-CN/docs/benchmark/tpch.md | 757 +++++++++++++++++++++++++++++---
10 files changed, 2055 insertions(+), 259 deletions(-)
diff --git a/docs/en/docs/benchmark/ssb.md b/docs/en/docs/benchmark/ssb.md
index 900909485e..098a3a0a27 100644
--- a/docs/en/docs/benchmark/ssb.md
+++ b/docs/en/docs/benchmark/ssb.md
@@ -28,15 +28,21 @@ under the License.
[Star Schema Benchmark(SSB)](https://www.cs.umb.edu/~poneil/StarSchemaB.PDF)
is a performance test set in a lightweight data warehouse scenario. Based on
[TPC-H](http://www.tpc.org/tpch/), SSB provides a simplified version of the
star schema dataset, which is mainly used to test the performance of
multi-table association queries under the star schema. . In addition, the
industry usually flattens SSB as a wide table model (hereinafter referred to
as: SSB flat) to test the performance of t [...]
-This document mainly introduces the performance of Doris on the SSB test set.
+This document mainly introduces the performance of Doris on the SSB 100G test
set.
> Note 1: The standard test set including SSB is usually far from the actual
> business scenario, and some tests will perform parameter tuning for the test
> set. Therefore, the test results of the standard test set can only reflect
> the performance of the database in specific scenarios. Users are advised to
> conduct further testing with actual business data.
>
> Note 2: The operations involved in this document are all performed in the
> Ubuntu Server 20.04 environment, and CentOS 7 can also be tested.
-On 13 queries on the SSB standard test dataset, we tested the upcoming Doris
1.1 version and Doris 0.15.0 RC04 version peer-to-peer, and the overall
performance improved by 2-3 times.
+We conducted pairwise testing on 13 queries on the SSB standard test dataset
based on Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0
RC04 versions.
-
+The overall performance improvement on SSB FlAT wide tables was nearly 4x on
Apache Doris 1.2.0-rc01 compared to Apache Doris 1.1.3, and nearly 10x on
Apache Doris 0.15.0 RC04.
+
+
+
+On the standard SSB test SQL, Apache Doris 1.2.0-rc01 delivers an overall
performance improvement of nearly 2X over Apache Doris 1.1.3 and nearly 31X
over Apache Doris 0.15.0 RC04.
+
+
## 1. Hardware Environment
@@ -52,7 +58,7 @@ On 13 queries on the SSB standard test dataset, we tested the
upcoming Doris 1.1
- Doris deploys 3BE 1FE;
- Kernel version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
- OS version: Ubuntu Server 20.04 LTS 64 bit
-- Doris software version: Apache Doris 1.1, Apache Doris 0.15.0 RC04
+- Doris software version: Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 ,
Apache Doris 0.15.0 RC04
- JDK: openjdk version "11.0.14" 2022-01-18
## 3. Test data volume
@@ -68,111 +74,129 @@ On 13 queries on the SSB standard test dataset, we tested
the upcoming Doris 1.1
## 4. Test Results
-Here we use the upcoming Doris-1.1 version and Doris-0.15.0 RC04 version for
comparative testing. The test results are as follows:
-
-| Query | Doris-1.1(ms) | Doris-0.15.0 RC04(ms) |
-| ----- | ------------- | --------------------- |
-| Q1.1 | 90 | 250 |
-| Q1.2 | 10 | 30 |
-| Q1.3 | 70 | 120 |
-| Q2.1 | 360 | 900 |
-| Q2.2 | 340 | 1020 |
-| Q2.3 | 260 | 770 |
-| Q3.1 | 550 | 1710 |
-| Q3.2 | 290 | 670 |
-| Q3.3 | 240 | 550 |
-| Q3.4 | 20 | 30 |
-| Q4.1 | 480 | 1250 |
-| Q4.2 | 240 | 400 |
-| Q4.3 | 200 | 330 |
+Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris
0.15.0 RC04 versions for comparative testing, with the following results.
+
+| Query | Apache Doris 1.2.0-rc01(ms) | Apache Doris 1.1.3 (ms) | Doris
0.15.0 RC04 (ms) |
+| ----- | ------------- | ------------- | ----------------- |
+| Q1.1 | 20 | 90 | 250 |
+| Q1.2 | 10 | 10 | 30 |
+| Q1.3 | 30 | 70 | 120 |
+| Q2.1 | 90 | 360 | 900 |
+| Q2.2 | 90 | 340 | 1020 |
+| Q2.3 | 60 | 260 | 770 |
+| Q3.1 | 160 | 550 | 1710 |
+| Q3.2 | 80 | 290 | 670 |
+| Q3.3 | 90 | 240 | 550 |
+| Q3.4 | 20 | 20 | 30 |
+| Q4.1 | 140 | 480 | 1250 |
+| Q4.2 | 50 | 240 | 400 |
+| Q4.3 | 30 | 200 | 330 |
+| Total | 880 | 3150 | 8030 |
**Interpretation of results**
- The data set corresponding to the test results is scale 100, about 600
million.
- The test environment is configured to be commonly used by users, including 4
cloud servers, 16-core 64G SSD, and 1 FE and 3 BE deployment.
- Use common user configuration tests to reduce user selection and evaluation
costs, but will not consume so many hardware resources during the entire test
process.
-- The test results are averaged over 3 executions. And the data has been fully
compacted (if the data is tested immediately after the data is imported, the
query delay may be higher than the test result, and the speed of compaction is
being continuously optimized and will be significantly reduced in the future).
-## 5. Environment Preparation
-Please refer to the [official document](../install/install-deploy.md) to
install and deploy Doris to obtain a normal running Doris cluster (at least 1
FE 1 BE, 1 FE 3 BE is recommended).
+## 5. Standard SSB test results
-You can modify BE's configuration file be.conf to add the following
configuration items and restart BE for better query performance.
+Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris
0.15.0 RC04 versions for comparative testing, with the following results.
-```shell
-enable_storage_vectorization=true
-enable_low_cardinality_optimize=true
-```
+| Query | Apache Doris 1.2.0-rc01 (ms) | Apache Doris 1.1.3 (ms) | Doris
0.15.0 RC04 (ms) |
+| ----- | ------- | ---------------------- | ------------------------------- |
+| Q1.1 | 40 | 18 | 350 |
+| Q1.2 | 30 | 100 | 80 |
+| Q1.3 | 20 | 70 | 80 |
+| Q2.1 | 350 | 940 | 20680 |
+| Q2.2 | 320 | 750 | 18250 |
+| Q2.3 | 300 | 720 | 14760 |
+| Q3.1 | 650 | 2150 | 22190 |
+| Q3.2 | 260 | 510 | 8360 |
+| Q3.3 | 220 | 450 | 6200 |
+| Q3.4 | 60 | 70 | 160 |
+| Q4.1 | 840 | 1480 | 24320 |
+| Q4.2 | 460 | 560 | 6310 |
+| Q4.3 | 610 | 660 | 10170 |
+| Total | 4160 | 8478 | 131910 |
-The scripts covered in the following documents are stored in
`tools/ssb-tools/` in the Doris codebase.
+**Interpretation of results**
-> **Notice:**
->
-> The above two parameters do not have these two parameters in version 0.15.0
RC04 and do not need to be configured.
+- The data set corresponding to the test results is scale 100, about 600
million.
+- The test environment is configured to be commonly used by users, including 4
cloud servers, 16-core 64G SSD, and 1 FE and 3 BE deployment.
+- Use common user configuration tests to reduce user selection and evaluation
costs, but will not consume so many hardware resources during the entire test
process.
+
+## 6. Environment Preparation
+
+Please first refer to the [official documentation](.
/install/install-deploy.md) for Apache Doris installation and deployment to get
a working Doris cluster (at least 1 FE 1 BE, 1 FE 3 BE recommended).
+
+The scripts covered in the following documentation are stored in the Apache
Doris codebase:
[ssb-tools](https://github.com/apache/doris/tree/master/tools/ssb-tools)
-## 6. Data Preparation
+## 7. Data Preparation
-### 6.1 Download and install the SSB data generation tool.
+### 7.1 Download and install the SSB data generation tool.
Execute the following script to download and compile the
[ssb-dbgen](https://github.com/electrum/ssb-dbgen.git) tool.
```shell
-bash bin/build-ssb-dbgen.sh
-```
+sh build-ssb-dbgen.sh
+````
-After successful installation, the `dbgen` binary will be generated in the
`bin/ssb-dbgen/` directory.
+After successful installation, the `dbgen` binary will be generated in the
`ssb-dbgen/` directory.
-### 6.2 Generate SSB test set
+### 7.2 Generate SSB test set
Execute the following script to generate the SSB dataset:
```shell
-bash bin/gen-ssb-data.sh
-```
+sh gen-ssb-data.sh -s 100 -c 100
+````
-> Note 1: See script help with `bash gen-ssb-data.sh -h`.The default scale
factor is 100 (referred to as sf100 for short). By default, it takes 6 minutes
to generate 10 data files, namely `bash bin/gen-ssb-data.sh -s 100 -c 10`.
+> Note 1: See script help with `sh gen-ssb-data.sh -h`.
>
-> Note 2: The data will be generated in the directory `bin/ssb-data/` with the
suffix`. tbl`. The total file size is about 60GB. The generation time may vary
from several minutes to one hour, and the information of the generated files
will be listed after the generation is completed.
+> Note 2: The data will be generated in the `ssb-data/` directory with the
suffix `.tbl`. The total file size is about 60GB. The generation time may vary
from a few minutes to an hour.
>
-> Note 3: `-s 100` indicates that the test set scale factor is 100, `-c 10`
indicates that 10 concurrent threads generate data for the lineorder table. The
`-c` parameter also determines the number of files in the final lineorder
table. The larger the parameter, the larger the number of files and the smaller
each file. Use the default parameters to test sf100, and `-s1000 -c100` to test
sf1000.
+> Note 3: `-s 100` indicates that the test set size factor is 100, `-c 100`
indicates that 100 concurrent threads generate data for the lineorder table.
The `-c` parameter also determines the number of files in the final lineorder
table. The larger the parameter, the larger the number of files and the smaller
each file.
With the `-s 100` parameter, the resulting dataset size is:
| Table | Rows | Size | File Number |
| --------- | ---------------- | ---- | ----------- |
-| lineorder | 6亿(600037902) | 60GB | 10 |
+| lineorder | 6亿(600037902) | 60GB | 100 |
| customer | 300万(3000000) | 277M | 1 |
| part | 140万(1400000) | 116M | 1 |
| supplier | 20万(200000) | 17M | 1 |
-| dates | 2556 | 228K | 1 |
+| date | 2556 | 228K | 1 |
+
+### 7.3 Create table
-### 6.3 Create table
+#### 7.3.1 Prepare the `doris-cluster.conf` file.
-#### 6.3.1 Prepare the `conf/doris-cluster.conf` file.
+Before calling the import script, you need to write the FE's ip port and other
information in the `doris-cluster.conf` file.
-Before calling the import script, you need to write the FE's ip port and other
information in the `conf/doris-cluster.conf` file.
+File location and `load-ssb-dimension-data.sh` level.
The contents of the file include FE's ip, HTTP port, user name, password and
the DB name of the data to be imported:
```shell
-export FE_HOST="127.0.0.1"
+export FE_HOST="xxx"
export FE_HTTP_PORT="8030"
export FE_QUERY_PORT="9030"
export USER="root"
-export PASSWORD=""
+export PASSWORD='xxx'
export DB="ssb"
```
-#### 6.3.2 Execute the following script to generate and create the SSB table:
+#### 7.3.2 Execute the following script to generate and create the SSB table:
```shell
-bash bin/create-ssb-tables.sh
-```
+sh create-ssb-tables.sh
+````
-Or copy the build table in
[create-ssb-tables.sql](https://github.com/apache/doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql)
Statement, executed in Doris.
-copy
[create-ssb-flat-table.sql](https://github.com/apache/doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
The table building statement in , executed in Doris.
+Or copy
[create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql)
and [
create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
of the The create table statements are executed in the MySQL client.
-Below is the `lineorder_flat` table building statement. The "lineorder_flat"
table is created in the above `bin/create-ssb-flat-table.sh` script with the
default number of buckets (48 buckets). You can delete this table and adjust
the number of buckets according to your cluster size node configuration, so as
to obtain a better test effect.
+The following is the `lineorder_flat` table build statement. The
`lineorder_flat` table is created in the `create-ssb-flat-table.sh` script
above with the default number of buckets (48 buckets). You can delete this
table and tune this bucketing number according to your cluster size node
configuration to get better one test results.
```sql
CREATE TABLE `lineorder_flat` (
@@ -234,27 +258,30 @@ PROPERTIES (
);
```
-### 6.4 Import data
+### 7.4 Import data
-The following script will connects Doirs to import according to the parameters
in ` conf/Doris-cluster.conf`, including imports four dimension tables
(customer, part, supplier and date) which has a small amount of data in single
thread, simultaneously imports one fact table (lineorder), and imports a wide
table (lineorder_flat) by' INSERT INTO ... SELECT ...'.
+We use the following command to complete the import of all data from SSB test
set and SSB FLAT wide table data synthesis and import into the table.
```shell
-bash bin/load-ssb-data.sh
+ sh bin/load-ssb-data.sh -c 10
```
-> Note 1: Check the script help through `bash bin/load-ssb-data.sh-h`, and by
default, it will start 5 threads to import lineorder concurrently, that is `-c
5`. If more threads are started, the import speed can be accelerated, but
additional memory overhead will be added.
->
-> Note 2: For faster import speed, you can restart BE after adding
`flush_thread_num_per_store=5` in be.conf. This configuration indicates the
number of disk write threads for each data directory, and the default is 2.
Larger data can improve write data throughput, but may increase IO Util.
(Reference value: 1 mechanical disk, when the default is 2, the IO Util during
the import process is about 12%, and when it is set to 5, the IO Util is about
26%. If it is an SSD disk, it is almost 0) .
+`-c 5` means start 10 concurrent threads for import (default is 5). In the
single BE node case, the lineorder data generated by `sh gen-ssb-data.sh -s 100
-c 100` will also generate the data of the ssb-flat table at the end, if more
threads are started, it can speed up the import, but it will add extra memory
overhead.
+
+> Notes.
>
-> Note 3: It cost about 389s in loading customer, part, supplier, date and
lineorder, and 740s in inserting into lineorder_flat.
+> 1. This configuration indicates the number of write threads per data
directory, and the default is 2. Larger data can improve write data throughput,
but may increase IO Util. (Reference value: 1 mechanical disk, at default is 2,
the IO Util during import is about 12%, and when set to 5, the IO Util is about
26%. (In case of SSD disks, it is almost 0).
+>
+> 2. flat table data using 'INSERT INTO ... SELECT ... ' method to import.
+
+### 7.5 Check imported data
-### 6.5 Check imported data
```sql
select count(*) from part;
select count(*) from customer;
select count(*) from supplier;
-select count(*) from dates;
+select count(*) from date;
select count(*) from lineorder;
select count(*) from lineorder_flat;
```
@@ -268,19 +295,11 @@ The amount of data should be the same as the number of
rows that generate the da
| customer | 300万(3000000) | 277 MB | 138.247 MB |
| part | 140万(1400000) | 116 MB | 12.759 MB |
| supplier | 20万(200000) | 17 MB | 9.143 MB |
-| dates | 2556 | 228 KB | 34.276 KB |
-
-### 6.6 Query test
-
-#### 6.6.1 Test script
+| date | 2556 | 228 KB | 34.276 KB |
-The following script connects Doris according to the parameters in `
conf/Doris-cluster.conf`, and prints out the rows of each table before
executing the query.
-
-```shell
-bash bin/run-ssb-flat-queries.sh
-```
+### 7.6 Query test
-#### 6.6.2 Test SQL
+#### 7.6.1 SSB FLAT Test SQL
```sql
--Q1.1
@@ -366,3 +385,220 @@ WHERE S_NATION = 'UNITED STATES' AND LO_ORDERDATE >=
19970101 AND LO_ORDERDATE <
GROUP BY YEAR, S_CITY, P_BRAND
ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;
```
+
+#### 7.6.2 SSB Standard Test SQL
+
+```SQL
+--Q1.1
+SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
+FROM lineorder, dates
+WHERE
+ lo_orderdate = d_datekey
+ AND d_year = 1993
+ AND lo_discount BETWEEN 1 AND 3
+ AND lo_quantity < 25;
+--Q1.2
+SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
+FROM lineorder, dates
+WHERE
+ lo_orderdate = d_datekey
+ AND d_yearmonth = 'Jan1994'
+ AND lo_discount BETWEEN 4 AND 6
+ AND lo_quantity BETWEEN 26 AND 35;
+
+--Q1.3
+SELECT
+ SUM(lo_extendedprice * lo_discount) AS REVENUE
+FROM lineorder, dates
+WHERE
+ lo_orderdate = d_datekey
+ AND d_weeknuminyear = 6
+ AND d_year = 1994
+ AND lo_discount BETWEEN 5 AND 7
+ AND lo_quantity BETWEEN 26 AND 35;
+
+--Q2.1
+SELECT SUM(lo_revenue), d_year, p_brand
+FROM lineorder, dates, part, supplier
+WHERE
+ lo_orderdate = d_datekey
+ AND lo_partkey = p_partkey
+ AND lo_suppkey = s_suppkey
+ AND p_category = 'MFGR#12'
+ AND s_region = 'AMERICA'
+GROUP BY d_year, p_brand
+ORDER BY p_brand;
+
+--Q2.2
+SELECT SUM(lo_revenue), d_year, p_brand
+FROM lineorder, dates, part, supplier
+WHERE
+ lo_orderdate = d_datekey
+ AND lo_partkey = p_partkey
+ AND lo_suppkey = s_suppkey
+ AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
+ AND s_region = 'ASIA'
+GROUP BY d_year, p_brand
+ORDER BY d_year, p_brand;
+
+--Q2.3
+SELECT SUM(lo_revenue), d_year, p_brand
+FROM lineorder, dates, part, supplier
+WHERE
+ lo_orderdate = d_datekey
+ AND lo_partkey = p_partkey
+ AND lo_suppkey = s_suppkey
+ AND p_brand = 'MFGR#2239'
+ AND s_region = 'EUROPE'
+GROUP BY d_year, p_brand
+ORDER BY d_year, p_brand;
+
+--Q3.1
+SELECT
+ c_nation,
+ s_nation,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND c_region = 'ASIA'
+ AND s_region = 'ASIA'
+ AND d_year >= 1992
+ AND d_year <= 1997
+GROUP BY c_nation, s_nation, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q3.2
+SELECT
+ c_city,
+ s_city,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND c_nation = 'UNITED STATES'
+ AND s_nation = 'UNITED STATES'
+ AND d_year >= 1992
+ AND d_year <= 1997
+GROUP BY c_city, s_city, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q3.3
+SELECT
+ c_city,
+ s_city,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND (
+ c_city = 'UNITED KI1'
+ OR c_city = 'UNITED KI5'
+ )
+ AND (
+ s_city = 'UNITED KI1'
+ OR s_city = 'UNITED KI5'
+ )
+ AND d_year >= 1992
+ AND d_year <= 1997
+GROUP BY c_city, s_city, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q3.4
+SELECT
+ c_city,
+ s_city,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND (
+ c_city = 'UNITED KI1'
+ OR c_city = 'UNITED KI5'
+ )
+ AND (
+ s_city = 'UNITED KI1'
+ OR s_city = 'UNITED KI5'
+ )
+ AND d_yearmonth = 'Dec1997'
+GROUP BY c_city, s_city, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q4.1
+SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=4,
enable_vectorized_engine=true, batch_size=4096,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ d_year,
+ c_nation,
+ SUM(lo_revenue - lo_supplycost) AS PROFIT
+FROM dates, customer, supplier, part, lineorder
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_partkey = p_partkey
+ AND lo_orderdate = d_datekey
+ AND c_region = 'AMERICA'
+ AND s_region = 'AMERICA'
+ AND (
+ p_mfgr = 'MFGR#1'
+ OR p_mfgr = 'MFGR#2'
+ )
+GROUP BY d_year, c_nation
+ORDER BY d_year, c_nation;
+
+--Q4.2
+SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2,
enable_vectorized_engine=true, batch_size=4096,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ d_year,
+ s_nation,
+ p_category,
+ SUM(lo_revenue - lo_supplycost) AS PROFIT
+FROM dates, customer, supplier, part, lineorder
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_partkey = p_partkey
+ AND lo_orderdate = d_datekey
+ AND c_region = 'AMERICA'
+ AND s_region = 'AMERICA'
+ AND (
+ d_year = 1997
+ OR d_year = 1998
+ )
+ AND (
+ p_mfgr = 'MFGR#1'
+ OR p_mfgr = 'MFGR#2'
+ )
+GROUP BY d_year, s_nation, p_category
+ORDER BY d_year, s_nation, p_category;
+
+--Q4.3
+SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2,
enable_vectorized_engine=true, batch_size=4096,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ d_year,
+ s_city,
+ p_brand,
+ SUM(lo_revenue - lo_supplycost) AS PROFIT
+FROM dates, customer, supplier, part, lineorder
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_partkey = p_partkey
+ AND lo_orderdate = d_datekey
+ AND s_nation = 'UNITED STATES'
+ AND (
+ d_year = 1997
+ OR d_year = 1998
+ )
+ AND p_category = 'MFGR#14'
+GROUP BY d_year, s_city, p_brand
+ORDER BY d_year, s_city, p_brand;
+```
diff --git a/docs/en/docs/benchmark/tpch.md b/docs/en/docs/benchmark/tpch.md
index d354730aac..b4d4112c59 100644
--- a/docs/en/docs/benchmark/tpch.md
+++ b/docs/en/docs/benchmark/tpch.md
@@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->
-# TPC-H benchmark
+# TPC-H Benchmark
TPC-H is a Decision Support Benchmark consisting of a set of business-oriented
ad hoc queries and concurrent data modifications. The data that queries and
populates the database has broad industry relevance. This benchmark
demonstrates a decision support system that examines large amounts of data,
executes highly complex queries, and answers critical business questions. The
performance metric reported by TPC-H is called the TPC-H Hourly Compound Query
Performance Metric (QphH@Size) and r [...]
@@ -34,9 +34,9 @@ This document mainly introduces the performance of Doris on
the TPC-H test set.
>
> Note 2: The operations covered in this document are tested on CentOS 7.x.
-On 22 queries on the TPC-H standard test dataset, we tested the upcoming Doris
1.1 version and Doris 0.15.0 RC04 version side by side, and the overall
performance improved by 3-4 times. In individual scenarios, it can achieve a
ten-fold improvement.
+On 22 queries on the TPC-H standard test dataset, we conducted pairwise tests
based on Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0
RC04 versions. The overall performance of Apache Doris 1.2.0-rc01 is nearly 3
times better than that of Apache Doris 1.1.3 and nearly 11 times better than
that of Apache Doris 0.15.0 RC04.
-
+
## 1. Hardware Environment
@@ -53,12 +53,12 @@ On 22 queries on the TPC-H standard test dataset, we tested
the upcoming Doris 1
- Doris deploys 3BE 1FE;
- Kernel version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
- OS version: CentOS 7.8
-- Doris software version: Apache Doris 1.1, Apache Doris 0.15.0 RC04
+- Doris software version: Apache Doris 1.2.0-rc01、 Apache Doris 1.1.3 、 Apache
Doris 0.15.0 RC04
- JDK: openjdk version "11.0.14" 2022-01-18
## 3. Test Data Volume
-The entire test simulation generates 100G of data and is imported into Doris
0.15.0 RC04 and Doris 1.1 versions for testing. The following is the relevant
description of the table and the amount of data.
+The entire test simulation generates 100G of data and is imported into Apache
Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions
for testing. The following is the relevant description of the table and the
amount of data.
| TPC-H Table Name | Rows | data size | remark |
| :--------------- | :---------- | ---------- | :----- |
@@ -73,11 +73,11 @@ The entire test simulation generates 100G of data and is
imported into Doris 0.1
## 4. Test SQL
-TPCH 22 test query statements :
[TPCH-Query-SQL](https://github.com/apache/doris/tree/master/tools/tpch-tools/queries)
+TPCH 22 test query statements :
[TPCH-Query-SQL](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools/queries)
-Notice:
+**Notice:**
-The following four parameters in the above SQL do not exist in 0.15.0 RC04.
When executed in 0.15.0 RC04, remove them:
+The following four parameters in the above SQL are not present in Apache Doris
0.15.0 RC04 and are removed during execution.
```
1. enable_vectorized_engine=true,
@@ -88,40 +88,39 @@ The following four parameters in the above SQL do not exist
in 0.15.0 RC04. When
## 5. Test Result
-Here we use the upcoming Doris-1.1 version and Doris-0.15.0 RC04 version for
comparative testing. The test results are as follows:
-
-| Query | Doris-1.1(s) | 0.15.0 RC04(s) |
-| --------- | ------------ | -------------- |
-| Q1 | 3.75 | 28.63 |
-| Q2 | 4.22 | 7.88 |
-| Q3 | 2.64 | 9.39 |
-| Q4 | 1.5 | 9.3 |
-| Q5 | 2.15 | 4.11 |
-| Q6 | 0.19 | 0.43 |
-| Q7 | 1.04 | 1.61 |
-| Q8 | 1.75 | 50.35 |
-| Q9 | 7.94 | 16.34 |
-| Q10 | 1.41 | 5.21 |
-| Q11 | 0.35 | 1.72 |
-| Q12 | 0.57 | 5.39 |
-| Q13 | 8.15 | 20.88 |
-| Q14 | 0.3 | |
-| Q15 | 0.66 | 1.86 |
-| Q16 | 0.79 | 1.32 |
-| Q17 | 1.51 | 26.67 |
-| Q18 | 3.364 | 11.77 |
-| Q19 | 0.829 | 1.71 |
-| Q20 | 2.77 | 5.2 |
-| Q21 | 4.47 | 10.34 |
-| Q22 | 0.9 | 3.22 |
-| **total** | **51.253** | **223.33** |
+Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris
0.15.0 RC04 versions for comparison tests with the following results.
+
+| Query | Apache Doris 1.2.0-rc01 (s) | Apache Doris 1.1.3 (s) | Apache
Doris 0.15.0 RC04 (s) |
+| -------- | --------------------------- | ---------------------- |
---------------------------- |
+| Q1 | 2.12 | 3.75 | 28.63
|
+| Q2 | 0.20 | 4.22 | 7.88
|
+| Q3 | 0.62 | 2.64 | 9.39
|
+| Q4 | 0.61 | 1.5 | 9.3
|
+| Q5 | 1.05 | 2.15 | 4.11
|
+| Q6 | 0.08 | 0.19 | 0.43
|
+| Q7 | 0.58 | 1.04 | 1.61
|
+| Q8 | 0.72 | 1.75 | 50.35
|
+| Q9 | 3.61 | 7.94 | 16.34
|
+| Q10 | 1.26 | 1.41 | 5.21
|
+| Q11 | 0.15 | 0.35 | 1.72
|
+| Q12 | 0.21 | 0.57 | 5.39
|
+| Q13 | 2.62 | 8.15 | 20.88
|
+| Q14 | 0.16 | 0.3 |
|
+| Q15 | 0.30 | 0.66 | 1.86
|
+| Q16 | 0.38 | 0.79 | 1.32
|
+| Q17 | 0.65 | 1.51 | 26.67
|
+| Q18 | 2.28 | 3.364 | 11.77
|
+| Q19 | 0.20 | 0.829 | 1.71
|
+| Q20 | 0.21 | 2.77 | 5.2
|
+| Q21 | 1.17 | 4.47 | 10.34
|
+| Q22 | 0.46 | 0.9 | 3.22
|
+| **Total** | **19.64** | **51.253** |
**223.33** |
- **Result description**
- The data set corresponding to the test results is scale 100, about 600
million.
- The test environment is configured to be commonly used by users, including
4 cloud servers, 16-core 64G SSD, and 1 FE and 3 BE deployment.
- Use common user configuration tests to reduce user selection and
evaluation costs, but will not consume so many hardware resources during the
entire test process.
- - The test results are averaged over 3 executions. And the data has been
fully compacted (if the data is tested immediately after the data is imported,
the query delay may be higher than the test result, and the speed of compaction
is being continuously optimized, and will be significantly reduced in the
future).
- - 0.15 RC04 Q14 execution failed in TPC-H test, unable to complete query.
+ - Apache Doris 0.15 RC04 Q14 execution failed in TPC-H test, unable to
complete query.
## 6. Environmental Preparation
@@ -131,10 +130,10 @@ Please refer to the [official
document](../install/install-deploy.md) to install
### 7.1 Download and install the TPC-H data generation tool
-Execute the following script to download and compile the
[tpch-tools](https://github.com/apache/doris/tree/master/tools/tpch-tools) tool.
+Execute the following script to download and compile the
[tpch-tools](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools)
tool.
```shell
-sh bin/build-tpch-dbgen.sh
+sh build-tpch-dbgen.sh
```
After successful installation, the `dbgen` binary will be generated in the
`TPC-H_Tools_v3.0.0/` directory.
@@ -144,12 +143,12 @@ After successful installation, the `dbgen` binary will be
generated in the `TPC-
Execute the following script to generate the TPC-H dataset:
```shell
-sh bin/gen-tpch-data.sh
+sh gen-tpch-data.sh
```
-> Note 1: View script help via `sh bin/gen-tpch-data.sh -h`.
+> Note 1: View script help via `sh gen-tpch-data.sh -h`.
>
-> Note 2: The data will be generated in the `bin/tpch-data/` directory with
the suffix `.tbl`. The total file size is about 100GB. The generation time may
vary from a few minutes to an hour.
+> Note 2: The data will be generated in the `tpch-data/` directory with the
suffix `.tbl`. The total file size is about 100GB. The generation time may vary
from a few minutes to an hour.
>
> Note 3: The standard test data set of 100G is generated by default
@@ -157,7 +156,7 @@ sh bin/gen-tpch-data.sh
#### 7.3.1 Prepare the `doris-cluster.conf` file
-Before calling the import script, you need to write the FE's ip port and other
information in the `conf/doris-cluster.conf` file.
+Before calling the import script, you need to write the FE's ip port and other
information in the `doris-cluster.conf` file.
File location and `load-tpch-data.sh` level.
@@ -175,7 +174,7 @@ export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
-export DB='tpch'
+export DB='tpch1'
```
#### 7.3.2 Execute the following script to generate and create the TPC-H table
@@ -183,7 +182,7 @@ export DB='tpch'
```shell
sh create-tpch-tables.sh
```
-Or copy the table creation statement in
[create-tpch-tables.sql](https://github.com/apache/doris/blob/master/tools/tpch-tools/ddl/create-tpch-tables.sql),
Execute in Doris.
+Or copy the table creation statement in
[create-tpch-tables.sql](https://github.com/apache/incubator-doris/blob/master/tools/tpch-tools/create-tpch-tables.sql),
Execute in Doris.
### 7.4 导入数据
@@ -191,12 +190,12 @@ Or copy the table creation statement in
[create-tpch-tables.sql](https://github.
通过下面的命令执行数据导入:
```shell
-sh bin/load-tpch-data.sh
+sh ./load-tpch-data.sh
```
### 7.5 Check Imported Data
-Execute the following SQL statement to check the imported data volume is
consistent with the above data volume.
+Execute the following SQL statement to check that the imported data is
consistent with the above data.
```sql
select count(*) from lineitem;
@@ -212,10 +211,12 @@ select count(*) from revenue0;
### 7.6 Query Test
+#### 7.6.1 Executing Query Scripts
+
Execute the above test SQL or execute the following command
```
-sh bin/run-tpch-queries.sh
+./run-tpch-queries.sh
```
>Notice:
@@ -223,3 +224,656 @@ sh bin/run-tpch-queries.sh
>1. At present, the query optimizer and statistics functions of Doris are not
>perfect, so we rewrite some queries in TPC-H to adapt to the execution
>framework of Doris, but it does not affect the correctness of the results
>
>2. Doris' new query optimizer will be released in subsequent versions
+>3. Set `set mem_exec_limit=8G` before executing the query
+
+#### 7.6.2 Single SQL Execution
+
+```SQL
+--Q1
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=false) */
+ l_returnflag,
+ l_linestatus,
+ sum(l_quantity) as sum_qty,
+ sum(l_extendedprice) as sum_base_price,
+ sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
+ sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
+ avg(l_quantity) as avg_qty,
+ avg(l_extendedprice) as avg_price,
+ avg(l_discount) as avg_disc,
+ count(*) as count_order
+from
+ lineitem
+where
+ l_shipdate <= date '1998-12-01' - interval '90' day
+group by
+ l_returnflag,
+ l_linestatus
+order by
+ l_returnflag,
+ l_linestatus;
+
+--Q2
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ s_acctbal,
+ s_name,
+ n_name,
+ p_partkey,
+ p_mfgr,
+ s_address,
+ s_phone,
+ s_comment
+from
+ partsupp join
+ (
+ select
+ ps_partkey as a_partkey,
+ min(ps_supplycost) as a_min
+ from
+ partsupp,
+ part,
+ supplier,
+ nation,
+ region
+ where
+ p_partkey = ps_partkey
+ and s_suppkey = ps_suppkey
+ and s_nationkey = n_nationkey
+ and n_regionkey = r_regionkey
+ and r_name = 'EUROPE'
+ and p_size = 15
+ and p_type like '%BRASS'
+ group by a_partkey
+ ) A on ps_partkey = a_partkey and ps_supplycost=a_min ,
+ part,
+ supplier,
+ nation,
+ region
+where
+ p_partkey = ps_partkey
+ and s_suppkey = ps_suppkey
+ and p_size = 15
+ and p_type like '%BRASS'
+ and s_nationkey = n_nationkey
+ and n_regionkey = r_regionkey
+ and r_name = 'EUROPE'
+
+order by
+ s_acctbal desc,
+ n_name,
+ s_name,
+ p_partkey
+limit 100;
+
+--Q3
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true,
runtime_filter_wait_time_ms=10000) */
+ l_orderkey,
+ sum(l_extendedprice * (1 - l_discount)) as revenue,
+ o_orderdate,
+ o_shippriority
+from
+ (
+ select l_orderkey, l_extendedprice, l_discount, o_orderdate,
o_shippriority, o_custkey from
+ lineitem join orders
+ where l_orderkey = o_orderkey
+ and o_orderdate < date '1995-03-15'
+ and l_shipdate > date '1995-03-15'
+ ) t1 join customer c
+ on c.c_custkey = t1.o_custkey
+ where c_mktsegment = 'BUILDING'
+group by
+ l_orderkey,
+ o_orderdate,
+ o_shippriority
+order by
+ revenue desc,
+ o_orderdate
+limit 10;
+
+--Q4
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ o_orderpriority,
+ count(*) as order_count
+from
+ (
+ select
+ *
+ from
+ lineitem
+ where l_commitdate < l_receiptdate
+ ) t1
+ right semi join orders
+ on t1.l_orderkey = o_orderkey
+where
+ o_orderdate >= date '1993-07-01'
+ and o_orderdate < date '1993-07-01' + interval '3' month
+group by
+ o_orderpriority
+order by
+ o_orderpriority;
+
+--Q5
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ n_name,
+ sum(l_extendedprice * (1 - l_discount)) as revenue
+from
+ customer,
+ orders,
+ lineitem,
+ supplier,
+ nation,
+ region
+where
+ c_custkey = o_custkey
+ and l_orderkey = o_orderkey
+ and l_suppkey = s_suppkey
+ and c_nationkey = s_nationkey
+ and s_nationkey = n_nationkey
+ and n_regionkey = r_regionkey
+ and r_name = 'ASIA'
+ and o_orderdate >= date '1994-01-01'
+ and o_orderdate < date '1994-01-01' + interval '1' year
+group by
+ n_name
+order by
+ revenue desc;
+
+--Q6
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ sum(l_extendedprice * l_discount) as revenue
+from
+ lineitem
+where
+ l_shipdate >= date '1994-01-01'
+ and l_shipdate < date '1994-01-01' + interval '1' year
+ and l_discount between .06 - 0.01 and .06 + 0.01
+ and l_quantity < 24;
+
+--Q7
+select /*+SET_VAR(exec_mem_limit=458589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ supp_nation,
+ cust_nation,
+ l_year,
+ sum(volume) as revenue
+from
+ (
+ select
+ n1.n_name as supp_nation,
+ n2.n_name as cust_nation,
+ extract(year from l_shipdate) as l_year,
+ l_extendedprice * (1 - l_discount) as volume
+ from
+ supplier,
+ lineitem,
+ orders,
+ customer,
+ nation n1,
+ nation n2
+ where
+ s_suppkey = l_suppkey
+ and o_orderkey = l_orderkey
+ and c_custkey = o_custkey
+ and s_nationkey = n1.n_nationkey
+ and c_nationkey = n2.n_nationkey
+ and (
+ (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
+ or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
+ )
+ and l_shipdate between date '1995-01-01' and date '1996-12-31'
+ ) as shipping
+group by
+ supp_nation,
+ cust_nation,
+ l_year
+order by
+ supp_nation,
+ cust_nation,
+ l_year;
+
+--Q8
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ o_year,
+ sum(case
+ when nation = 'BRAZIL' then volume
+ else 0
+ end) / sum(volume) as mkt_share
+from
+ (
+ select
+ extract(year from o_orderdate) as o_year,
+ l_extendedprice * (1 - l_discount) as volume,
+ n2.n_name as nation
+ from
+ lineitem,
+ orders,
+ customer,
+ supplier,
+ part,
+ nation n1,
+ nation n2,
+ region
+ where
+ p_partkey = l_partkey
+ and s_suppkey = l_suppkey
+ and l_orderkey = o_orderkey
+ and o_custkey = c_custkey
+ and c_nationkey = n1.n_nationkey
+ and n1.n_regionkey = r_regionkey
+ and r_name = 'AMERICA'
+ and s_nationkey = n2.n_nationkey
+ and o_orderdate between date '1995-01-01' and date '1996-12-31'
+ and p_type = 'ECONOMY ANODIZED STEEL'
+ ) as all_nations
+group by
+ o_year
+order by
+ o_year;
+
+--Q9
+select/*+SET_VAR(exec_mem_limit=37179869184,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true,
enable_remove_no_conjuncts_runtime_filter_policy=true,
runtime_filter_wait_time_ms=100000) */
+ nation,
+ o_year,
+ sum(amount) as sum_profit
+from
+ (
+ select
+ n_name as nation,
+ extract(year from o_orderdate) as o_year,
+ l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as
amount
+ from
+ lineitem join orders on o_orderkey = l_orderkey
+ join[shuffle] part on p_partkey = l_partkey
+ join[shuffle] partsupp on ps_partkey = l_partkey
+ join[shuffle] supplier on s_suppkey = l_suppkey
+ join[broadcast] nation on s_nationkey = n_nationkey
+ where
+ ps_suppkey = l_suppkey and
+ p_name like '%green%'
+ ) as profit
+group by
+ nation,
+ o_year
+order by
+ nation,
+ o_year desc;
+
+--Q10
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ c_custkey,
+ c_name,
+ sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue,
+ c_acctbal,
+ n_name,
+ c_address,
+ c_phone,
+ c_comment
+from
+ customer,
+ (
+ select o_custkey,l_extendedprice,l_discount from lineitem, orders
+ where l_orderkey = o_orderkey
+ and o_orderdate >= date '1993-10-01'
+ and o_orderdate < date '1993-10-01' + interval '3' month
+ and l_returnflag = 'R'
+ ) t1,
+ nation
+where
+ c_custkey = t1.o_custkey
+ and c_nationkey = n_nationkey
+group by
+ c_custkey,
+ c_name,
+ c_acctbal,
+ c_phone,
+ n_name,
+ c_address,
+ c_comment
+order by
+ revenue desc
+limit 20;
+
+--Q11
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ ps_partkey,
+ sum(ps_supplycost * ps_availqty) as value
+from
+ partsupp,
+ (
+ select s_suppkey
+ from supplier, nation
+ where s_nationkey = n_nationkey and n_name = 'GERMANY'
+ ) B
+where
+ ps_suppkey = B.s_suppkey
+group by
+ ps_partkey having
+ sum(ps_supplycost * ps_availqty) > (
+ select
+ sum(ps_supplycost * ps_availqty) * 0.000002
+ from
+ partsupp,
+ (select s_suppkey
+ from supplier, nation
+ where s_nationkey = n_nationkey and n_name = 'GERMANY'
+ ) A
+ where
+ ps_suppkey = A.s_suppkey
+ )
+order by
+ value desc;
+
+--Q12
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ l_shipmode,
+ sum(case
+ when o_orderpriority = '1-URGENT'
+ or o_orderpriority = '2-HIGH'
+ then 1
+ else 0
+ end) as high_line_count,
+ sum(case
+ when o_orderpriority <> '1-URGENT'
+ and o_orderpriority <> '2-HIGH'
+ then 1
+ else 0
+ end) as low_line_count
+from
+ orders,
+ lineitem
+where
+ o_orderkey = l_orderkey
+ and l_shipmode in ('MAIL', 'SHIP')
+ and l_commitdate < l_receiptdate
+ and l_shipdate < l_commitdate
+ and l_receiptdate >= date '1994-01-01'
+ and l_receiptdate < date '1994-01-01' + interval '1' year
+group by
+ l_shipmode
+order by
+ l_shipmode;
+
+--Q13
+select /*+SET_VAR(exec_mem_limit=45899345920,
parallel_fragment_exec_instance_num=16, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ c_count,
+ count(*) as custdist
+from
+ (
+ select
+ c_custkey,
+ count(o_orderkey) as c_count
+ from
+ orders right outer join customer on
+ c_custkey = o_custkey
+ and o_comment not like '%special%requests%'
+ group by
+ c_custkey
+ ) as c_orders
+group by
+ c_count
+order by
+ custdist desc,
+ c_count desc;
+
+--Q14
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true,
runtime_filter_mode=OFF) */
+ 100.00 * sum(case
+ when p_type like 'PROMO%'
+ then l_extendedprice * (1 - l_discount)
+ else 0
+ end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
+from
+ part,
+ lineitem
+where
+ l_partkey = p_partkey
+ and l_shipdate >= date '1995-09-01'
+ and l_shipdate < date '1995-09-01' + interval '1' month;
+
+--Q15
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ s_suppkey,
+ s_name,
+ s_address,
+ s_phone,
+ total_revenue
+from
+ supplier,
+ revenue0
+where
+ s_suppkey = supplier_no
+ and total_revenue = (
+ select
+ max(total_revenue)
+ from
+ revenue0
+ )
+order by
+ s_suppkey;
+
+--Q16
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ p_brand,
+ p_type,
+ p_size,
+ count(distinct ps_suppkey) as supplier_cnt
+from
+ partsupp,
+ part
+where
+ p_partkey = ps_partkey
+ and p_brand <> 'Brand#45'
+ and p_type not like 'MEDIUM POLISHED%'
+ and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
+ and ps_suppkey not in (
+ select
+ s_suppkey
+ from
+ supplier
+ where
+ s_comment like '%Customer%Complaints%'
+ )
+group by
+ p_brand,
+ p_type,
+ p_size
+order by
+ supplier_cnt desc,
+ p_brand,
+ p_type,
+ p_size;
+
+--Q17
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ sum(l_extendedprice) / 7.0 as avg_yearly
+from
+ lineitem join [broadcast]
+ part p1 on p1.p_partkey = l_partkey
+where
+ p1.p_brand = 'Brand#23'
+ and p1.p_container = 'MED BOX'
+ and l_quantity < (
+ select
+ 0.2 * avg(l_quantity)
+ from
+ lineitem join [broadcast]
+ part p2 on p2.p_partkey = l_partkey
+ where
+ l_partkey = p1.p_partkey
+ and p2.p_brand = 'Brand#23'
+ and p2.p_container = 'MED BOX'
+ );
+
+--Q18
+
+select /*+SET_VAR(exec_mem_limit=45899345920,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ c_name,
+ c_custkey,
+ t3.o_orderkey,
+ t3.o_orderdate,
+ t3.o_totalprice,
+ sum(t3.l_quantity)
+from
+customer join
+(
+ select * from
+ lineitem join
+ (
+ select * from
+ orders left semi join
+ (
+ select
+ l_orderkey
+ from
+ lineitem
+ group by
+ l_orderkey having sum(l_quantity) > 300
+ ) t1
+ on o_orderkey = t1.l_orderkey
+ ) t2
+ on t2.o_orderkey = l_orderkey
+) t3
+on c_custkey = t3.o_custkey
+group by
+ c_name,
+ c_custkey,
+ t3.o_orderkey,
+ t3.o_orderdate,
+ t3.o_totalprice
+order by
+ t3.o_totalprice desc,
+ t3.o_orderdate
+limit 100;
+
+--Q19
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ sum(l_extendedprice* (1 - l_discount)) as revenue
+from
+ lineitem,
+ part
+where
+ (
+ p_partkey = l_partkey
+ and p_brand = 'Brand#12'
+ and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
+ and l_quantity >= 1 and l_quantity <= 1 + 10
+ and p_size between 1 and 5
+ and l_shipmode in ('AIR', 'AIR REG')
+ and l_shipinstruct = 'DELIVER IN PERSON'
+ )
+ or
+ (
+ p_partkey = l_partkey
+ and p_brand = 'Brand#23'
+ and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
+ and l_quantity >= 10 and l_quantity <= 10 + 10
+ and p_size between 1 and 10
+ and l_shipmode in ('AIR', 'AIR REG')
+ and l_shipinstruct = 'DELIVER IN PERSON'
+ )
+ or
+ (
+ p_partkey = l_partkey
+ and p_brand = 'Brand#34'
+ and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
+ and l_quantity >= 20 and l_quantity <= 20 + 10
+ and p_size between 1 and 15
+ and l_shipmode in ('AIR', 'AIR REG')
+ and l_shipinstruct = 'DELIVER IN PERSON'
+ );
+
+--Q20
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true,
runtime_bloom_filter_size=551943) */
+s_name, s_address from
+supplier left semi join
+(
+ select * from
+ (
+ select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q
+ from lineitem
+ where l_shipdate >= date '1994-01-01'
+ and l_shipdate < date '1994-01-01' + interval '1' year
+ group by l_partkey,l_suppkey
+ ) t2 join
+ (
+ select ps_partkey, ps_suppkey, ps_availqty
+ from partsupp left semi join part
+ on ps_partkey = p_partkey and p_name like 'forest%'
+ ) t1
+ on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey
+ and t1.ps_availqty > t2.l_q
+) t3
+on s_suppkey = t3.ps_suppkey
+join nation
+where s_nationkey = n_nationkey
+ and n_name = 'CANADA'
+order by s_name;
+
+--Q21
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true) */
+s_name, count(*) as numwait
+from
+ lineitem l2 right semi join
+ (
+ select * from
+ lineitem l3 right anti join
+ (
+ select * from
+ orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus
= 'F'
+ join
+ (
+ select * from
+ supplier join nation
+ where s_nationkey = n_nationkey
+ and n_name = 'SAUDI ARABIA'
+ ) t1
+ where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate
+ ) t2
+ on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and
l3.l_receiptdate > l3.l_commitdate
+ ) t3
+ on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey
+
+group by
+ t3.s_name
+order by
+ numwait desc,
+ t3.s_name
+limit 100;
+
+--Q21
+
+with tmp as (select
+ avg(c_acctbal) as av
+ from
+ customer
+ where
+ c_acctbal > 0.00
+ and substring(c_phone, 1, 2) in
+ ('13', '31', '23', '29', '30', '18', '17'))
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4,runtime_bloom_filter_size=4194304) */
+ cntrycode,
+ count(*) as numcust,
+ sum(c_acctbal) as totacctbal
+from
+ (
+ select
+ substring(c_phone, 1, 2) as cntrycode,
+ c_acctbal
+ from
+ orders right anti join customer c on o_custkey = c.c_custkey
join tmp on c.c_acctbal > tmp.av
+ where
+ substring(c_phone, 1, 2) in
+ ('13', '31', '23', '29', '30', '18', '17')
+ ) as custsale
+group by
+ cntrycode
+order by
+ cntrycode;
+```
+
+
+
diff --git a/docs/images/image-20220614114351241.png
b/docs/images/image-20220614114351241.png
deleted file mode 100644
index fe64d53ef3..0000000000
Binary files a/docs/images/image-20220614114351241.png and /dev/null differ
diff --git a/docs/images/ssb.png b/docs/images/ssb.png
new file mode 100644
index 0000000000..fd5c611777
Binary files /dev/null and b/docs/images/ssb.png differ
diff --git a/docs/images/ssb_flat.png b/docs/images/ssb_flat.png
new file mode 100644
index 0000000000..8a6dbc145c
Binary files /dev/null and b/docs/images/ssb_flat.png differ
diff --git a/docs/images/ssb_v11_v015_compare.png
b/docs/images/ssb_v11_v015_compare.png
deleted file mode 100644
index d2e2b5ba99..0000000000
Binary files a/docs/images/ssb_v11_v015_compare.png and /dev/null differ
diff --git a/docs/images/tpch.png b/docs/images/tpch.png
new file mode 100644
index 0000000000..41cba63a58
Binary files /dev/null and b/docs/images/tpch.png differ
diff --git a/docs/sidebars.json b/docs/sidebars.json
index eb3ce2ba7f..297940c885 100644
--- a/docs/sidebars.json
+++ b/docs/sidebars.json
@@ -1068,6 +1068,14 @@
"faq/data-faq",
"faq/sql-faq"
]
+ },
+ {
+ "type": "category",
+ "label": "Benchmark",
+ "items": [
+ "benchmark/ssb",
+ "benchmark/tpch"
+ ]
}
]
}
diff --git a/docs/zh-CN/docs/benchmark/ssb.md b/docs/zh-CN/docs/benchmark/ssb.md
index 701de4aa59..b775d75690 100644
--- a/docs/zh-CN/docs/benchmark/ssb.md
+++ b/docs/zh-CN/docs/benchmark/ssb.md
@@ -1,6 +1,6 @@
---
{
- "title": "Star-Schema-Benchmark 测试",
+ "title": "Star-Schema-Benchmark",
"language": "zh-CN"
}
---
@@ -28,15 +28,21 @@ under the License.
[Star Schema Benchmark(SSB)](https://www.cs.umb.edu/~poneil/StarSchemaB.PDF)
是一个轻量级的数仓场景下的性能测试集。SSB 基于 [TPC-H](http://www.tpc.org/tpch/)
提供了一个简化版的星型模型数据集,主要用于测试在星型模型下,多表关联查询的性能表现。另外,业界内通常也会将 SSB 打平为宽表模型(以下简称:SSB
flat),来测试查询引擎的性能,参考[Clickhouse](https://clickhouse.com/docs/zh/getting-started/example-datasets/star-schema)。
-本文档主要介绍 Doris 在 SSB 测试集上的性能表现。
+本文档主要介绍Apache Doris 在 SSB 100G 测试集上的性能表现。
> 注 1:包括 SSB
> 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议用户使用实际业务数据进行进一步的测试。
>
> 注 2:本文档涉及的操作都在 Ubuntu Server 20.04 环境进行,CentOS 7 也可测试。
-在 SSB 标准测试数据集上的 13 个查询上,我们对即将发布的 Doris 1.1 版本和 Doris 0.15.0 RC04
版本进行了对别测试,整体性能提升了 2-3 倍。
+在 SSB 标准测试数据集上的 13 个查询上,我们基于 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及
Apache Doris 0.15.0 RC04 版本进行了对别测试。
-
+在 SSB FlAT 宽表上, Apache Doris 1.2.0-rc01上相对 Apache Doris 1.1.3 整体性能提升了将近4倍,相对于
Apache Doris 0.15.0 RC04 ,性能提升了将近10倍 。
+
+
+
+在标准的 SSB 测试SQL上, Apache Doris 1.2.0-rc01 上相对 Apache Doris 1.1.3
整体性能提升了将近2倍,相对于 Apache Doris 0.15.0 RC04 ,性能提升了将近 31 倍 。
+
+
## 1. 硬件环境
@@ -51,128 +57,146 @@ under the License.
- Doris 部署 3BE 1FE;
- 内核版本:Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
-- 操作系统版本:Ubuntu Server 20.04 LTS 64 位
-- Doris 软件版本:Apache Doris 1.1 、Apache Doris 0.15.0 RC04
+- 操作系统版本:Ubuntu Server 20.04 LTS 64位
+- Doris 软件版本: Apache Doris 1.2.0-rc01、Apache Doris 1.1.3 及 Apache Doris 0.15.0
RC04
- JDK:openjdk version "11.0.14" 2022-01-18
## 3. 测试数据量
| SSB表名 | 行数 | 备注 |
| :------------- | :--------- | :--------------- |
-| lineorder | 600,037,902 | 商品订单明细表表 |
-| customer | 3,000,000 | 客户信息表 |
-| part | 1,400,000 | 零件信息表 |
+| lineorder | 600,037,902 | 商品订单明细表表 |
+| customer | 3,000,000 | 客户信息表 |
+| part | 1,400,000 | 零件信息表 |
| supplier | 200,000 | 供应商信息表 |
-| date | 2,556 | 日期表 |
-| lineorder_flat | 600,037,902 | 数据展平后的宽表 |
-
-## 4. 测试结果
-
-这里我们使用即将发布的 Doris-1.1 版本和 Doris-0.15.0 RC04 版本进行对比测试,测试结果如下:
-
-| Query | Doris-1.1(ms) | Doris-0.15.0 RC04(ms) |
-| ----- | ------------- | --------------------- |
-| Q1.1 | 90 | 250 |
-| Q1.2 | 10 | 30 |
-| Q1.3 | 70 | 120 |
-| Q2.1 | 360 | 900 |
-| Q2.2 | 340 | 1020 |
-| Q2.3 | 260 | 770 |
-| Q3.1 | 550 | 1710 |
-| Q3.2 | 290 | 670 |
-| Q3.3 | 240 | 550 |
-| Q3.4 | 20 | 30 |
-| Q4.1 | 480 | 1250 |
-| Q4.2 | 240 | 400 |
-| Q4.3 | 200 | 330 |
+| date | 2,556 | 日期表 |
+| lineorder_flat | 600,037,902 | 数据展平后的宽表 |
+
+## 4. SSB 宽表测试结果
+
+这里我们使用 Apache Doris 1.2.0-rc01、 Apache Doris 1.1.3 及 Apache Doris 0.15.0 RC04
版本进行对比测试,测试结果如下:
+
+
+| Query | Apache Doris 1.2.0-rc01(ms) | Apache Doris 1.1.3(ms) | Apache Doris
0.15.0 RC04(ms) |
+| ----- | ------------- | ------------- | ----------------- |
+| Q1.1 | 20 | 90 | 250 |
+| Q1.2 | 10 | 10 | 30 |
+| Q1.3 | 30 | 70 | 120 |
+| Q2.1 | 90 | 360 | 900 |
+| Q2.2 | 90 | 340 | 1020 |
+| Q2.3 | 60 | 260 | 770 |
+| Q3.1 | 160 | 550 | 1710 |
+| Q3.2 | 80 | 290 | 670 |
+| Q3.3 | 90 | 240 | 550 |
+| Q3.4 | 20 | 20 | 30 |
+| Q4.1 | 140 | 480 | 1250 |
+| Q4.2 | 50 | 240 | 400 |
+| Q4.3 | 30 | 200 | 330 |
+| 合计 | 880 | 3150 | 8030 |
**结果说明**
- 测试结果对应的数据集为 scale 100, 约 6 亿条。
- 测试环境配置为用户常用配置,云服务器 4 台,16 核 64G SSD,1 FE 3 BE 部署。
- 选用用户常见配置测试以降低用户选型评估成本,但整个测试过程中不会消耗如此多的硬件资源。
-- 测试结果为 3 次执行取平均值。并且数据经过充分的 compaction(如果在刚导入数据后立刻测试,则查询延迟可能高于本测试结果,compaction
的速度正在持续优化中,未来会显著降低)。
-## 5. 环境准备
+## 5. 标准 SSB 测试结果
+
+这里我们使用 Apache Doris 1.2.0-rc01、Apache Doris 1.1.3 及 Apache Doris 0.15.0 RC04
版本进行对比测试,测试结果如下:
+
+| Query | Apache Doris 1.2.0-rc01(ms) | Apache Doris 1.1.3 (ms) | Apache
Doris 0.15.0 RC04(ms) |
+| ----- | ------- | ---------------------- | ------------------------------- |
+| Q1.1 | 40 | 18 | 350 |
+| Q1.2 | 30 | 100 | 80 |
+| Q1.3 | 20 | 70 | 80 |
+| Q2.1 | 350 | 940 | 20680 |
+| Q2.2 | 320 | 750 | 18250 |
+| Q2.3 | 300 | 720 | 14760 |
+| Q3.1 | 650 | 2150 | 22190 |
+| Q3.2 | 260 | 510 | 8360 |
+| Q3.3 | 220 | 450 | 6200 |
+| Q3.4 | 60 | 70 | 160 |
+| Q4.1 | 840 | 1480 | 24320 |
+| Q4.2 | 460 | 560 | 6310 |
+| Q4.3 | 610 | 660 | 10170 |
+| 合计 | 4160 | 8478 | 131910 |
-请先参照 [官方文档](../install/install-deploy.md) 进行 Doris 的安装部署,以获得一个正常运行中的 Doris
集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。
+**结果说明**
-可修改 BE 的配置文件 be.conf 添加以下配置项,重启 BE,以获得更好的查询性能。
+- 测试结果对应的数据集为scale 100, 约6亿条。
+- 测试环境配置为用户常用配置,云服务器4台,16核 64G SSD,1 FE 3 BE 部署。
+- 选用用户常见配置测试以降低用户选型评估成本,但整个测试过程中不会消耗如此多的硬件资源。
-```shell
-enable_storage_vectorization=true
-enable_low_cardinality_optimize=true
-```
-以下文档中涉及的脚本都存放在 Doris 代码库的 `tools/ssb-tools/` 下。
+## 6. 环境准备
-> **注意:**
->
-> 上面这两个参数在 0.15.0 RC04 版本里没有这两个参数,不需要配置。
+请先参照 [官方文档](../install/install-deploy.md) 进行 Apache Doris 的安装部署,以获得一个正常运行中的
Doris 集群(至少包含 1 FE 1 BE,推荐 1 FE 3 BE)。
+
+以下文档中涉及的脚本都存放在 Apache Doris
代码库:[ssb-tools](https://github.com/apache/doris/tree/master/tools/ssb-tools)
-## 6. 数据准备
+## 7. 数据准备
-### 6.1 下载安装 SSB 数据生成工具。
+### 7.1 下载安装 SSB 数据生成工具。
执行以下脚本下载并编译 [ssb-dbgen](https://github.com/electrum/ssb-dbgen.git) 工具。
```shell
-bash bin/build-ssb-dbgen.sh
+sh build-ssb-dbgen.sh
```
-安装成功后,将在 `bin/ssb-dbgen/` 目录下生成 `dbgen` 二进制文件。
+安装成功后,将在 `ssb-dbgen/` 目录下生成 `dbgen` 二进制文件。
-### 6.2 生成 SSB 测试集
+### 7.2 生成 SSB 测试集
执行以下脚本生成 SSB 数据集:
```shell
-bash bin/gen-ssb-data.sh
+sh gen-ssb-data.sh -s 100 -c 100
```
-> 注 1:通过 `bash bin/gen-ssb-data.sh -h` 查看脚本帮助,默认 scale factor 为 100(简称
sf100),默认生成 10 个数据文件,即 `bash bin/gen-ssb-data.sh -s 100 -c 10`,耗时数分钟。
+> 注1:通过 `sh gen-ssb-data.sh -h` 查看脚本帮助。
>
-> 注 2:数据会以 `.tbl` 为后缀生成在 `bin/ssb-data/` 目录下。文件总大小约 60GB。生成时间可能在数分钟到 1
小时不等,生成完成后会列出生成文件的信息。
+> 注2:数据会以 `.tbl` 为后缀生成在 `ssb-data/` 目录下。文件总大小约60GB。生成时间可能在数分钟到1小时不等。
>
-> 注 3:`-s 100` 表示测试集大小系数为 100,`-c 10` 表示并发 10 个线程生成 lineorder 表的数据。`-c`
参数也决定了最终 lineorder 表的文件数量。参数越大,文件数越多,每个文件越小。测试 sf100 用默认参数即可,测试 sf1000 用 `-s
1000 -c 100` 。
+> 注3:`-s 100` 表示测试集大小系数为 100,`-c 100` 表示并发100个线程生成 lineorder 表的数据。`-c`
参数也决定了最终 lineorder 表的文件数量。参数越大,文件数越多,每个文件越小。
在 `-s 100` 参数下,生成的数据集大小为:
| Table | Rows | Size | File Number |
| --------- | ---------------- | ---- | ----------- |
-| lineorder | 6亿(600037902) | 60GB | 10 |
+| lineorder | 6亿(600037902) | 60GB | 100 |
| customer | 300万(3000000) | 277M | 1 |
| part | 140万(1400000) | 116M | 1 |
| supplier | 20万(200000) | 17M | 1 |
-| dates | 2556 | 228K | 1 |
+| date | 2556 | 228K | 1 |
-### 6.3 建表
+### 7.3 建表
-#### 6.3.1 准备 `conf/doris-cluster.conf` 文件。
+#### 7.3.1 准备 `doris-cluster.conf` 文件。
-在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 `conf/doris-cluster.conf` 文件中。
+在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 `doris-cluster.conf` 文件中。
-文件内容包括 FE 的 ip,HTTP 端口,用户名,密码(默认为空)以及待导入数据的 DB 名称:
+文件位置和 `load-ssb-dimension-data.sh` 平级。
+
+文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:
```shell
-export FE_HOST="127.0.0.1"
+export FE_HOST="xxx"
export FE_HTTP_PORT="8030"
export FE_QUERY_PORT="9030"
export USER="root"
-export PASSWORD=""
+export PASSWORD='xxx'
export DB="ssb"
```
-#### 6.3.2 执行以下脚本生成创建 SSB 表:
+#### 7.3.2 执行以下脚本生成创建 SSB 表:
```shell
-bash bin/create-ssb-tables.sh
+sh create-ssb-tables.sh
```
+或者复制
[create-ssb-tables.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql)
和
[create-ssb-flat-table.sql](https://github.com/apache/incubator-doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
中的建表语句,在 MySQL 客户端中执行。
-或者复制
[create-ssb-tables.sql](https://github.com/apache/doris/tree/master/tools/ssb-tools/ddl/create-ssb-tables.sql)
中的建表语句,在 Doris 中执行。
-复制
[create-ssb-flat-table.sql](https://github.com/apache/doris/tree/master/tools/ssb-tools/ddl/create-ssb-flat-table.sql)
中的建表语句,在 Doris 中执行。
-
-下面是 `lineorder_flat` 表建表语句。在上面的 `bin/create-ssb-table.sh`
脚本中创建"lineorder_flat"表,并进行了默认分桶数(48
个桶)。您可以删除该表,根据您的集群规模节点配置对这个分桶数进行调整,这样可以获取到更好的一个测试效果。
+下面是 `lineorder_flat` 表建表语句。在上面的 `create-ssb-flat-table.sh` 脚本中创建
`lineorder_flat`
表,并进行了默认分桶数(48个桶)。您可以删除该表,根据您的集群规模节点配置对这个分桶数进行调整,这样可以获取到更好的一个测试效果。
```sql
CREATE TABLE `lineorder_flat` (
@@ -234,27 +258,31 @@ PROPERTIES (
);
```
-### 6.4 导入数据
+### 7.4 导入数据
+
+我们使用以下命令完成 SSB 测试集所有数据导入及 SSB FLAT 宽表数据合成并导入到表里。
-下面的脚本根据 `conf/doris-cluster.conf` 中的参数连接 Doirs 进行导入,单线程导入数据量较小的 4
张维度表(customer, part, supplier and date),并发导入 1 张事实表(lineorder),以及采用 'INSERT
INTO ... SELECT ... ' 的方式导入宽表(lineorder_flat)。
```shell
-bash bin/load-ssb-data.sh
+sh bin/load-ssb-data.sh -c 10
```
-> 注 1:通过 `bash bin/load-ssb-data.sh -h` 查看脚本帮助, 默认 5 线程并发导入 lineorder,即 `-c 5`
。如果开启更多线程,可以加快导入速度,但会增加额外的内存开销。
+`-c 5` 表示启动 10 个并发线程导入(默认为 5)。在单 BE 节点情况下,由 `sh gen-ssb-data.sh -s 100 -c 100`
生成的 lineorder 数据,同时会在最后生成ssb-flat表的数据,如果开启更多线程,可以加快导入速度,但会增加额外的内存开销。
+
+> 注:
>
-> 注 2:为获得更快的导入速度,你可以在 be.conf 中添加 `flush_thread_num_per_store=5` 后重启
BE。该配置表示每个数据目录的写盘线程数,默认为 2。较大的数据可以提升写数据吞吐,但可能会增加 IO Util。(参考值:1 块机械磁盘,在默认为 2
的情况下,导入过程中的 IO Util 约为 12%,设置为 5 时,IO Util 约为 26%。如果是 SSD 盘,则几乎为 0)。
+> 1. 为获得更快的导入速度,你可以在 be.conf 中添加 `flush_thread_num_per_store=5`
后重启BE。该配置表示每个数据目录的写盘线程数,默认为2。较大的数据可以提升写数据吞吐,但可能会增加 IO
Util。(参考值:1块机械磁盘,在默认为2的情况下,导入过程中的 IO Util 约为12%,设置为5时,IO Util 约为26%。如果是 SSD
盘,则几乎为 0)。
>
-> 注 3:导入 customer, part, supplier, date 及 lineorder 表耗时 389s,打平到
lineorder_flat 耗时 740s.
+> 2. flat 表数据采用 'INSERT INTO ... SELECT ... ' 的方式导入。
+
-### 6.5 检查导入数据
+### 7.5 检查导入数据
```sql
select count(*) from part;
select count(*) from customer;
select count(*) from supplier;
-select count(*) from dates;
+select count(*) from date;
select count(*) from lineorder;
select count(*) from lineorder_flat;
```
@@ -268,19 +296,17 @@ select count(*) from lineorder_flat;
| customer | 300万(3000000) | 277 MB | 138.247 MB |
| part | 140万(1400000) | 116 MB | 12.759 MB |
| supplier | 20万(200000) | 17 MB | 9.143 MB |
-| dates | 2556 | 228 KB | 34.276 KB |
+| date | 2556 | 228 KB | 34.276 KB |
-### 6.6 查询测试
+### 7.6 查询测试
-#### 6.6.1 测试脚本
+SSB-FlAT 查询语句
:[ssb-flat-queries](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-flat-queries)
-下面脚本根据 `conf/doris-cluster.conf` 中的参数连接 Doris,执行查询前会先打印出各表的数据行数。
-```shell
-bash bin/run-ssb-flat-queries.sh
-```
+标准 SSB 查询语句
:[ssb-queries](https://github.com/apache/doris/tree/master/tools/ssb-tools/ssb-queries)
+
+#### 7.6.1 SSB FLAT 测试 SQL
-#### 6.6.2 测试 SQL
```sql
--Q1.1
@@ -366,3 +392,222 @@ WHERE S_NATION = 'UNITED STATES' AND LO_ORDERDATE >=
19970101 AND LO_ORDERDATE <
GROUP BY YEAR, S_CITY, P_BRAND
ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;
```
+
+
+
+#### **7.6.2 SSB 标准测试 SQL**
+
+```sql
+--Q1.1
+SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
+FROM lineorder, dates
+WHERE
+ lo_orderdate = d_datekey
+ AND d_year = 1993
+ AND lo_discount BETWEEN 1 AND 3
+ AND lo_quantity < 25;
+--Q1.2
+SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE
+FROM lineorder, dates
+WHERE
+ lo_orderdate = d_datekey
+ AND d_yearmonth = 'Jan1994'
+ AND lo_discount BETWEEN 4 AND 6
+ AND lo_quantity BETWEEN 26 AND 35;
+
+--Q1.3
+SELECT
+ SUM(lo_extendedprice * lo_discount) AS REVENUE
+FROM lineorder, dates
+WHERE
+ lo_orderdate = d_datekey
+ AND d_weeknuminyear = 6
+ AND d_year = 1994
+ AND lo_discount BETWEEN 5 AND 7
+ AND lo_quantity BETWEEN 26 AND 35;
+
+--Q2.1
+SELECT SUM(lo_revenue), d_year, p_brand
+FROM lineorder, dates, part, supplier
+WHERE
+ lo_orderdate = d_datekey
+ AND lo_partkey = p_partkey
+ AND lo_suppkey = s_suppkey
+ AND p_category = 'MFGR#12'
+ AND s_region = 'AMERICA'
+GROUP BY d_year, p_brand
+ORDER BY p_brand;
+
+--Q2.2
+SELECT SUM(lo_revenue), d_year, p_brand
+FROM lineorder, dates, part, supplier
+WHERE
+ lo_orderdate = d_datekey
+ AND lo_partkey = p_partkey
+ AND lo_suppkey = s_suppkey
+ AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228'
+ AND s_region = 'ASIA'
+GROUP BY d_year, p_brand
+ORDER BY d_year, p_brand;
+
+--Q2.3
+SELECT SUM(lo_revenue), d_year, p_brand
+FROM lineorder, dates, part, supplier
+WHERE
+ lo_orderdate = d_datekey
+ AND lo_partkey = p_partkey
+ AND lo_suppkey = s_suppkey
+ AND p_brand = 'MFGR#2239'
+ AND s_region = 'EUROPE'
+GROUP BY d_year, p_brand
+ORDER BY d_year, p_brand;
+
+--Q3.1
+SELECT
+ c_nation,
+ s_nation,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND c_region = 'ASIA'
+ AND s_region = 'ASIA'
+ AND d_year >= 1992
+ AND d_year <= 1997
+GROUP BY c_nation, s_nation, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q3.2
+SELECT
+ c_city,
+ s_city,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND c_nation = 'UNITED STATES'
+ AND s_nation = 'UNITED STATES'
+ AND d_year >= 1992
+ AND d_year <= 1997
+GROUP BY c_city, s_city, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q3.3
+SELECT
+ c_city,
+ s_city,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND (
+ c_city = 'UNITED KI1'
+ OR c_city = 'UNITED KI5'
+ )
+ AND (
+ s_city = 'UNITED KI1'
+ OR s_city = 'UNITED KI5'
+ )
+ AND d_year >= 1992
+ AND d_year <= 1997
+GROUP BY c_city, s_city, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q3.4
+SELECT
+ c_city,
+ s_city,
+ d_year,
+ SUM(lo_revenue) AS REVENUE
+FROM customer, lineorder, supplier, dates
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_orderdate = d_datekey
+ AND (
+ c_city = 'UNITED KI1'
+ OR c_city = 'UNITED KI5'
+ )
+ AND (
+ s_city = 'UNITED KI1'
+ OR s_city = 'UNITED KI5'
+ )
+ AND d_yearmonth = 'Dec1997'
+GROUP BY c_city, s_city, d_year
+ORDER BY d_year ASC, REVENUE DESC;
+
+--Q4.1
+SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=4,
enable_vectorized_engine=true, batch_size=4096,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ d_year,
+ c_nation,
+ SUM(lo_revenue - lo_supplycost) AS PROFIT
+FROM dates, customer, supplier, part, lineorder
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_partkey = p_partkey
+ AND lo_orderdate = d_datekey
+ AND c_region = 'AMERICA'
+ AND s_region = 'AMERICA'
+ AND (
+ p_mfgr = 'MFGR#1'
+ OR p_mfgr = 'MFGR#2'
+ )
+GROUP BY d_year, c_nation
+ORDER BY d_year, c_nation;
+
+--Q4.2
+SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2,
enable_vectorized_engine=true, batch_size=4096,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ d_year,
+ s_nation,
+ p_category,
+ SUM(lo_revenue - lo_supplycost) AS PROFIT
+FROM dates, customer, supplier, part, lineorder
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_partkey = p_partkey
+ AND lo_orderdate = d_datekey
+ AND c_region = 'AMERICA'
+ AND s_region = 'AMERICA'
+ AND (
+ d_year = 1997
+ OR d_year = 1998
+ )
+ AND (
+ p_mfgr = 'MFGR#1'
+ OR p_mfgr = 'MFGR#2'
+ )
+GROUP BY d_year, s_nation, p_category
+ORDER BY d_year, s_nation, p_category;
+
+--Q4.3
+SELECT /*+SET_VAR(parallel_fragment_exec_instance_num=2,
enable_vectorized_engine=true, batch_size=4096,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ d_year,
+ s_city,
+ p_brand,
+ SUM(lo_revenue - lo_supplycost) AS PROFIT
+FROM dates, customer, supplier, part, lineorder
+WHERE
+ lo_custkey = c_custkey
+ AND lo_suppkey = s_suppkey
+ AND lo_partkey = p_partkey
+ AND lo_orderdate = d_datekey
+ AND s_nation = 'UNITED STATES'
+ AND (
+ d_year = 1997
+ OR d_year = 1998
+ )
+ AND p_category = 'MFGR#14'
+GROUP BY d_year, s_city, p_brand
+ORDER BY d_year, s_city, p_brand;
+```
diff --git a/docs/zh-CN/docs/benchmark/tpch.md
b/docs/zh-CN/docs/benchmark/tpch.md
index 66499eb4a7..7f9d12f855 100644
--- a/docs/zh-CN/docs/benchmark/tpch.md
+++ b/docs/zh-CN/docs/benchmark/tpch.md
@@ -1,6 +1,6 @@
---
{
- "title": "TPC-H benchmark",
+ "title": "TPC-H Benchmark",
"language": "zh-CN"
}
---
@@ -24,19 +24,19 @@ specific language governing permissions and limitations
under the License.
-->
-# TPC-H benchmark
+# TPC-H Benchmark
TPC-H是一个决策支持基准(Decision Support
Benchmark),它由一套面向业务的特别查询和并发数据修改组成。查询和填充数据库的数据具有广泛的行业相关性。这个基准测试演示了检查大量数据、执行高度复杂的查询并回答关键业务问题的决策支持系统。TPC-H报告的性能指标称为TPC-H每小时复合查询性能指标(QphH@Size),反映了系统处理查询能力的多个方面。这些方面包括执行查询时所选择的数据库大小,由单个流提交查询时的查询处理能力,以及由多个并发用户提交查询时的查询吞吐量。
-本文档主要介绍 Doris 在 TPC-H 测试集上的性能表现。
+本文档主要介绍 Doris 在 TPC-H 100G 测试集上的性能表现。
> 注1:包括 TPC-H
> 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议用户使用实际业务数据进行进一步的测试。
>
> 注2:本文档涉及的操作都在 CentOS 7.x 上进行测试。
-在 TPC-H 标准测试数据集上的 22 个查询上,我们对即将发布的 Doris 1.1 版本和 Doris 0.15.0 RC04
版本进行了对别测试,整体性能提升了 3-4 倍。个别场景下达到十几倍的提升。
+在 TPC-H 标准测试数据集上的 22 个查询上,我们基于 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及
Apache Doris 0.15.0 RC04 版本进行了对别测试, Apache Doris 1.2.0-rc01上相对 Apache Doris
1.1.3 整体性能提升了将近 3 倍,相对于 Apache Doris 0.15.0 RC04 ,性能提升了将近 11 倍 。
-
+
## 1. 硬件环境
@@ -53,12 +53,12 @@ TPC-H是一个决策支持基准(Decision Support Benchmark),它由一套
- Doris部署 3BE 1FE;
- 内核版本:Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
- 操作系统版本:CentOS 7.8
-- Doris 软件版本:Apache Doris 1.1 、Apache Doris 0.15.0 RC04
+- Doris 软件版本: Apache Doris 1.2.0-rc01、 Apache Doris 1.1.3 、 Apache Doris
0.15.0 RC04
- JDK:openjdk version "11.0.14" 2022-01-18
## 3. 测试数据量
-整个测试模拟生成100G的数据分别导入到 Doris 0.15.0 RC04 和 Doris 1.1 版本进行测试,下面是表的相关说明及数据量。
+整个测试模拟生成 TPCH 100G 的数据分别导入到 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及
Apache Doris 0.15.0 RC04 版本进行测试,下面是表的相关说明及数据量。
| TPC-H表名 | 行数 | 导入后大小 | 备注 |
| :-------- | :----- | ---------- | :----------- |
@@ -73,11 +73,11 @@ TPC-H是一个决策支持基准(Decision Support Benchmark),它由一套
## 4. 测试SQL
-TPCH 22个测试查询语句 :
[TPCH-Query-SQL](https://github.com/apache/doris/tree/master/tools/tpch-tools/queries)
+TPCH 22 个测试查询语句 :
[TPCH-Query-SQL](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools/queries)
**注意:**
-以上SQL中的以下四个参数在0.15.0 RC04中不存在,在0.15.0 RC04中执行的时候,去掉:
+以上 SQL 中的以下四个参数在 Apache Doris 0.15.0 RC04 中不存在,在执行的时候,去掉:
```
1. enable_vectorized_engine=true,
@@ -86,45 +86,42 @@ TPCH 22个测试查询语句 :
[TPCH-Query-SQL](https://github.com/apache/dori
4. enable_projection=true
```
-
-
## 5. 测试结果
-这里我们使用即将发布的 Doris-1.1版本和 Doris-0.15.0 RC04 版本进行对比测试,测试结果如下:
-
-| Query | Doris-1.1(s) | 0.15.0 RC04(s) |
-| --------- | ------------ | -------------- |
-| Q1 | 3.75 | 28.63 |
-| Q2 | 4.22 | 7.88 |
-| Q3 | 2.64 | 9.39 |
-| Q4 | 1.5 | 9.3 |
-| Q5 | 2.15 | 4.11 |
-| Q6 | 0.19 | 0.43 |
-| Q7 | 1.04 | 1.61 |
-| Q8 | 1.75 | 50.35 |
-| Q9 | 7.94 | 16.34 |
-| Q10 | 1.41 | 5.21 |
-| Q11 | 0.35 | 1.72 |
-| Q12 | 0.57 | 5.39 |
-| Q13 | 8.15 | 20.88 |
-| Q14 | 0.3 | |
-| Q15 | 0.66 | 1.86 |
-| Q16 | 0.79 | 1.32 |
-| Q17 | 1.51 | 26.67 |
-| Q18 | 3.364 | 11.77 |
-| Q19 | 0.829 | 1.71 |
-| Q20 | 2.77 | 5.2 |
-| Q21 | 4.47 | 10.34 |
-| Q22 | 0.9 | 3.22 |
-| **total** | **51.253** | **223.33** |
+这里我们使用 Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 及 Apache Doris 0.15.0 RC04
版本进行对比测试,测试结果如下:
+
+| Query | Apache Doris 1.2.0-rc01 (s) | Apache Doris 1.1.3 (s) | Apache
Doris 0.15.0 RC04 (s) |
+| -------- | --------------------------- | ---------------------- |
---------------------------- |
+| Q1 | 2.12 | 3.75 | 28.63
|
+| Q2 | 0.20 | 4.22 | 7.88
|
+| Q3 | 0.62 | 2.64 | 9.39
|
+| Q4 | 0.61 | 1.5 | 9.3
|
+| Q5 | 1.05 | 2.15 | 4.11
|
+| Q6 | 0.08 | 0.19 | 0.43
|
+| Q7 | 0.58 | 1.04 | 1.61
|
+| Q8 | 0.72 | 1.75 | 50.35
|
+| Q9 | 3.61 | 7.94 | 16.34
|
+| Q10 | 1.26 | 1.41 | 5.21
|
+| Q11 | 0.15 | 0.35 | 1.72
|
+| Q12 | 0.21 | 0.57 | 5.39
|
+| Q13 | 2.62 | 8.15 | 20.88
|
+| Q14 | 0.16 | 0.3 |
|
+| Q15 | 0.30 | 0.66 | 1.86
|
+| Q16 | 0.38 | 0.79 | 1.32
|
+| Q17 | 0.65 | 1.51 | 26.67
|
+| Q18 | 2.28 | 3.364 | 11.77
|
+| Q19 | 0.20 | 0.829 | 1.71
|
+| Q20 | 0.21 | 2.77 | 5.2
|
+| Q21 | 1.17 | 4.47 | 10.34
|
+| Q22 | 0.46 | 0.9 | 3.22
|
+| **合计** | **19.64** | **51.253** | **223.33**
|
**结果说明**
- 测试结果对应的数据集为scale 100, 约6亿条。
- 测试环境配置为用户常用配置,云服务器4台,16核 64G SSD,1 FE 3 BE 部署。
- 选用用户常见配置测试以降低用户选型评估成本,但整个测试过程中不会消耗如此多的硬件资源。
-- 测试结果为3次执行取平均值。并且数据经过充分的
compaction(如果在刚导入数据后立刻测试,则查询延迟可能高于本测试结果,compaction的速度正在持续优化中,未来会显著降低)。
-- 0.15 RC04 在 TPC-H 测试中 Q14 执行失败,无法完成查询。
+- Apache Doris 0.15 RC04 在 TPC-H 测试中 Q14 执行失败,无法完成查询。
## 6. 环境准备
@@ -134,10 +131,10 @@ TPCH 22个测试查询语句 :
[TPCH-Query-SQL](https://github.com/apache/dori
### 7.1 下载安装 TPC-H 数据生成工具
-执行以下脚本下载并编译
[tpch-tools](https://github.com/apache/doris/tree/master/tools/tpch-tools) 工具。
+执行以下脚本下载并编译
[tpch-tools](https://github.com/apache/incubator-doris/tree/master/tools/tpch-tools)
工具。
```shell
-sh bin/build-tpch-dbgen.sh
+sh build-tpch-dbgen.sh
```
安装成功后,将在 `TPC-H_Tools_v3.0.0/` 目录下生成 `dbgen` 二进制文件。
@@ -147,12 +144,12 @@ sh bin/build-tpch-dbgen.sh
执行以下脚本生成 TPC-H 数据集:
```shell
-sh bin/gen-tpch-data.sh
+sh gen-tpch-data.sh
```
-> 注1:通过 `sh bin/gen-tpch-data.sh -h` 查看脚本帮助。
+> 注1:通过 `sh gen-tpch-data.sh -h` 查看脚本帮助。
>
-> 注2:数据会以 `.tbl` 为后缀生成在 `bin/tpch-data/` 目录下。文件总大小约100GB。生成时间可能在数分钟到1小时不等。
+> 注2:数据会以 `.tbl` 为后缀生成在 `tpch-data/` 目录下。文件总大小约100GB。生成时间可能在数分钟到1小时不等。
>
> 注3:默认生成 100G 的标准测试数据集
@@ -160,7 +157,9 @@ sh bin/gen-tpch-data.sh
#### 7.3.1 准备 `doris-cluster.conf` 文件
-在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 `conf/doris-cluster.conf` 文件中。
+在调用导入脚本前,需要将 FE 的 ip 端口等信息写在 `doris-cluster.conf` 文件中。
+
+文件位置和 `load-tpch-data.sh` 平级。
文件内容包括 FE 的 ip,HTTP 端口,用户名,密码以及待导入数据的 DB 名称:
@@ -176,15 +175,15 @@ export USER='root'
# Doris password
export PASSWORD=''
# The database where TPC-H tables located
-export DB='tpch'
+export DB='tpch1'
```
#### 7.3.2 执行以下脚本生成创建 TPC-H 表
```shell
-sh bin/create-tpch-tables.sh
+sh create-tpch-tables.sh
```
-或者复制
[create-tpch-tables.sql](https://github.com/apache/doris/blob/master/tools/tpch-tools/ddl/create-tpch-tables.sql)
中的建表语句,在 Doris 中执行。
+或者复制
[create-tpch-tables.sql](https://github.com/apache/incubator-doris/blob/master/tools/tpch-tools/create-tpch-tables.sql)
中的建表语句,在 Doris 中执行。
### 7.4 导入数据
@@ -192,12 +191,12 @@ sh bin/create-tpch-tables.sh
通过下面的命令执行数据导入:
```shell
-sh bin/load-tpch-data.sh
+sh ./load-tpch-data.sh
```
### 7.5 检查导入数据
-执行下面的 SQL 语句检查导入的数据量上 上面的数据量是一致。
+执行下面的 SQL 语句检查导入的数据与上面的数据量是一致。
```sql
select count(*) from lineitem;
@@ -213,10 +212,12 @@ select count(*) from revenue0;
### 7.6 查询测试
+## 7.6.1 执行查询脚本
+
执行上面的测试 SQL 或者 执行下面的命令
```
-sh bin/run-tpch-queries.sh
+./run-tpch-queries.sh
```
>注意:
@@ -224,3 +225,655 @@ sh bin/run-tpch-queries.sh
>1. 目前Doris的查询优化器和统计信息功能还不完善,所以我们在TPC-H中重写了一些查询以适应Doris的执行框架,但不影响结果的正确性
>
>2. Doris 新的查询优化器将在后续的版本中发布
+>3. 执行查询之前设置 `set mem_exec_limit=8G`
+
+## 7.6.2 单个 SQL 执行
+
+```sql
+--Q1
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=false) */
+ l_returnflag,
+ l_linestatus,
+ sum(l_quantity) as sum_qty,
+ sum(l_extendedprice) as sum_base_price,
+ sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
+ sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
+ avg(l_quantity) as avg_qty,
+ avg(l_extendedprice) as avg_price,
+ avg(l_discount) as avg_disc,
+ count(*) as count_order
+from
+ lineitem
+where
+ l_shipdate <= date '1998-12-01' - interval '90' day
+group by
+ l_returnflag,
+ l_linestatus
+order by
+ l_returnflag,
+ l_linestatus;
+
+--Q2
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ s_acctbal,
+ s_name,
+ n_name,
+ p_partkey,
+ p_mfgr,
+ s_address,
+ s_phone,
+ s_comment
+from
+ partsupp join
+ (
+ select
+ ps_partkey as a_partkey,
+ min(ps_supplycost) as a_min
+ from
+ partsupp,
+ part,
+ supplier,
+ nation,
+ region
+ where
+ p_partkey = ps_partkey
+ and s_suppkey = ps_suppkey
+ and s_nationkey = n_nationkey
+ and n_regionkey = r_regionkey
+ and r_name = 'EUROPE'
+ and p_size = 15
+ and p_type like '%BRASS'
+ group by a_partkey
+ ) A on ps_partkey = a_partkey and ps_supplycost=a_min ,
+ part,
+ supplier,
+ nation,
+ region
+where
+ p_partkey = ps_partkey
+ and s_suppkey = ps_suppkey
+ and p_size = 15
+ and p_type like '%BRASS'
+ and s_nationkey = n_nationkey
+ and n_regionkey = r_regionkey
+ and r_name = 'EUROPE'
+
+order by
+ s_acctbal desc,
+ n_name,
+ s_name,
+ p_partkey
+limit 100;
+
+--Q3
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true,
runtime_filter_wait_time_ms=10000) */
+ l_orderkey,
+ sum(l_extendedprice * (1 - l_discount)) as revenue,
+ o_orderdate,
+ o_shippriority
+from
+ (
+ select l_orderkey, l_extendedprice, l_discount, o_orderdate,
o_shippriority, o_custkey from
+ lineitem join orders
+ where l_orderkey = o_orderkey
+ and o_orderdate < date '1995-03-15'
+ and l_shipdate > date '1995-03-15'
+ ) t1 join customer c
+ on c.c_custkey = t1.o_custkey
+ where c_mktsegment = 'BUILDING'
+group by
+ l_orderkey,
+ o_orderdate,
+ o_shippriority
+order by
+ revenue desc,
+ o_orderdate
+limit 10;
+
+--Q4
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ o_orderpriority,
+ count(*) as order_count
+from
+ (
+ select
+ *
+ from
+ lineitem
+ where l_commitdate < l_receiptdate
+ ) t1
+ right semi join orders
+ on t1.l_orderkey = o_orderkey
+where
+ o_orderdate >= date '1993-07-01'
+ and o_orderdate < date '1993-07-01' + interval '3' month
+group by
+ o_orderpriority
+order by
+ o_orderpriority;
+
+--Q5
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ n_name,
+ sum(l_extendedprice * (1 - l_discount)) as revenue
+from
+ customer,
+ orders,
+ lineitem,
+ supplier,
+ nation,
+ region
+where
+ c_custkey = o_custkey
+ and l_orderkey = o_orderkey
+ and l_suppkey = s_suppkey
+ and c_nationkey = s_nationkey
+ and s_nationkey = n_nationkey
+ and n_regionkey = r_regionkey
+ and r_name = 'ASIA'
+ and o_orderdate >= date '1994-01-01'
+ and o_orderdate < date '1994-01-01' + interval '1' year
+group by
+ n_name
+order by
+ revenue desc;
+
+--Q6
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ sum(l_extendedprice * l_discount) as revenue
+from
+ lineitem
+where
+ l_shipdate >= date '1994-01-01'
+ and l_shipdate < date '1994-01-01' + interval '1' year
+ and l_discount between .06 - 0.01 and .06 + 0.01
+ and l_quantity < 24;
+
+--Q7
+select /*+SET_VAR(exec_mem_limit=458589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ supp_nation,
+ cust_nation,
+ l_year,
+ sum(volume) as revenue
+from
+ (
+ select
+ n1.n_name as supp_nation,
+ n2.n_name as cust_nation,
+ extract(year from l_shipdate) as l_year,
+ l_extendedprice * (1 - l_discount) as volume
+ from
+ supplier,
+ lineitem,
+ orders,
+ customer,
+ nation n1,
+ nation n2
+ where
+ s_suppkey = l_suppkey
+ and o_orderkey = l_orderkey
+ and c_custkey = o_custkey
+ and s_nationkey = n1.n_nationkey
+ and c_nationkey = n2.n_nationkey
+ and (
+ (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
+ or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
+ )
+ and l_shipdate between date '1995-01-01' and date '1996-12-31'
+ ) as shipping
+group by
+ supp_nation,
+ cust_nation,
+ l_year
+order by
+ supp_nation,
+ cust_nation,
+ l_year;
+
+--Q8
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ o_year,
+ sum(case
+ when nation = 'BRAZIL' then volume
+ else 0
+ end) / sum(volume) as mkt_share
+from
+ (
+ select
+ extract(year from o_orderdate) as o_year,
+ l_extendedprice * (1 - l_discount) as volume,
+ n2.n_name as nation
+ from
+ lineitem,
+ orders,
+ customer,
+ supplier,
+ part,
+ nation n1,
+ nation n2,
+ region
+ where
+ p_partkey = l_partkey
+ and s_suppkey = l_suppkey
+ and l_orderkey = o_orderkey
+ and o_custkey = c_custkey
+ and c_nationkey = n1.n_nationkey
+ and n1.n_regionkey = r_regionkey
+ and r_name = 'AMERICA'
+ and s_nationkey = n2.n_nationkey
+ and o_orderdate between date '1995-01-01' and date '1996-12-31'
+ and p_type = 'ECONOMY ANODIZED STEEL'
+ ) as all_nations
+group by
+ o_year
+order by
+ o_year;
+
+--Q9
+select/*+SET_VAR(exec_mem_limit=37179869184,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true,
enable_remove_no_conjuncts_runtime_filter_policy=true,
runtime_filter_wait_time_ms=100000) */
+ nation,
+ o_year,
+ sum(amount) as sum_profit
+from
+ (
+ select
+ n_name as nation,
+ extract(year from o_orderdate) as o_year,
+ l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as
amount
+ from
+ lineitem join orders on o_orderkey = l_orderkey
+ join[shuffle] part on p_partkey = l_partkey
+ join[shuffle] partsupp on ps_partkey = l_partkey
+ join[shuffle] supplier on s_suppkey = l_suppkey
+ join[broadcast] nation on s_nationkey = n_nationkey
+ where
+ ps_suppkey = l_suppkey and
+ p_name like '%green%'
+ ) as profit
+group by
+ nation,
+ o_year
+order by
+ nation,
+ o_year desc;
+
+--Q10
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ c_custkey,
+ c_name,
+ sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue,
+ c_acctbal,
+ n_name,
+ c_address,
+ c_phone,
+ c_comment
+from
+ customer,
+ (
+ select o_custkey,l_extendedprice,l_discount from lineitem, orders
+ where l_orderkey = o_orderkey
+ and o_orderdate >= date '1993-10-01'
+ and o_orderdate < date '1993-10-01' + interval '3' month
+ and l_returnflag = 'R'
+ ) t1,
+ nation
+where
+ c_custkey = t1.o_custkey
+ and c_nationkey = n_nationkey
+group by
+ c_custkey,
+ c_name,
+ c_acctbal,
+ c_phone,
+ n_name,
+ c_address,
+ c_comment
+order by
+ revenue desc
+limit 20;
+
+--Q11
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ ps_partkey,
+ sum(ps_supplycost * ps_availqty) as value
+from
+ partsupp,
+ (
+ select s_suppkey
+ from supplier, nation
+ where s_nationkey = n_nationkey and n_name = 'GERMANY'
+ ) B
+where
+ ps_suppkey = B.s_suppkey
+group by
+ ps_partkey having
+ sum(ps_supplycost * ps_availqty) > (
+ select
+ sum(ps_supplycost * ps_availqty) * 0.000002
+ from
+ partsupp,
+ (select s_suppkey
+ from supplier, nation
+ where s_nationkey = n_nationkey and n_name = 'GERMANY'
+ ) A
+ where
+ ps_suppkey = A.s_suppkey
+ )
+order by
+ value desc;
+
+--Q12
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ l_shipmode,
+ sum(case
+ when o_orderpriority = '1-URGENT'
+ or o_orderpriority = '2-HIGH'
+ then 1
+ else 0
+ end) as high_line_count,
+ sum(case
+ when o_orderpriority <> '1-URGENT'
+ and o_orderpriority <> '2-HIGH'
+ then 1
+ else 0
+ end) as low_line_count
+from
+ orders,
+ lineitem
+where
+ o_orderkey = l_orderkey
+ and l_shipmode in ('MAIL', 'SHIP')
+ and l_commitdate < l_receiptdate
+ and l_shipdate < l_commitdate
+ and l_receiptdate >= date '1994-01-01'
+ and l_receiptdate < date '1994-01-01' + interval '1' year
+group by
+ l_shipmode
+order by
+ l_shipmode;
+
+--Q13
+select /*+SET_VAR(exec_mem_limit=45899345920,
parallel_fragment_exec_instance_num=16, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ c_count,
+ count(*) as custdist
+from
+ (
+ select
+ c_custkey,
+ count(o_orderkey) as c_count
+ from
+ orders right outer join customer on
+ c_custkey = o_custkey
+ and o_comment not like '%special%requests%'
+ group by
+ c_custkey
+ ) as c_orders
+group by
+ c_count
+order by
+ custdist desc,
+ c_count desc;
+
+--Q14
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true,
runtime_filter_mode=OFF) */
+ 100.00 * sum(case
+ when p_type like 'PROMO%'
+ then l_extendedprice * (1 - l_discount)
+ else 0
+ end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
+from
+ part,
+ lineitem
+where
+ l_partkey = p_partkey
+ and l_shipdate >= date '1995-09-01'
+ and l_shipdate < date '1995-09-01' + interval '1' month;
+
+--Q15
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ s_suppkey,
+ s_name,
+ s_address,
+ s_phone,
+ total_revenue
+from
+ supplier,
+ revenue0
+where
+ s_suppkey = supplier_no
+ and total_revenue = (
+ select
+ max(total_revenue)
+ from
+ revenue0
+ )
+order by
+ s_suppkey;
+
+--Q16
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=8, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ p_brand,
+ p_type,
+ p_size,
+ count(distinct ps_suppkey) as supplier_cnt
+from
+ partsupp,
+ part
+where
+ p_partkey = ps_partkey
+ and p_brand <> 'Brand#45'
+ and p_type not like 'MEDIUM POLISHED%'
+ and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
+ and ps_suppkey not in (
+ select
+ s_suppkey
+ from
+ supplier
+ where
+ s_comment like '%Customer%Complaints%'
+ )
+group by
+ p_brand,
+ p_type,
+ p_size
+order by
+ supplier_cnt desc,
+ p_brand,
+ p_type,
+ p_size;
+
+--Q17
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=1, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ sum(l_extendedprice) / 7.0 as avg_yearly
+from
+ lineitem join [broadcast]
+ part p1 on p1.p_partkey = l_partkey
+where
+ p1.p_brand = 'Brand#23'
+ and p1.p_container = 'MED BOX'
+ and l_quantity < (
+ select
+ 0.2 * avg(l_quantity)
+ from
+ lineitem join [broadcast]
+ part p2 on p2.p_partkey = l_partkey
+ where
+ l_partkey = p1.p_partkey
+ and p2.p_brand = 'Brand#23'
+ and p2.p_container = 'MED BOX'
+ );
+
+--Q18
+
+select /*+SET_VAR(exec_mem_limit=45899345920,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true) */
+ c_name,
+ c_custkey,
+ t3.o_orderkey,
+ t3.o_orderdate,
+ t3.o_totalprice,
+ sum(t3.l_quantity)
+from
+customer join
+(
+ select * from
+ lineitem join
+ (
+ select * from
+ orders left semi join
+ (
+ select
+ l_orderkey
+ from
+ lineitem
+ group by
+ l_orderkey having sum(l_quantity) > 300
+ ) t1
+ on o_orderkey = t1.l_orderkey
+ ) t2
+ on t2.o_orderkey = l_orderkey
+) t3
+on c_custkey = t3.o_custkey
+group by
+ c_name,
+ c_custkey,
+ t3.o_orderkey,
+ t3.o_orderdate,
+ t3.o_totalprice
+order by
+ t3.o_totalprice desc,
+ t3.o_orderdate
+limit 100;
+
+--Q19
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=false,
enable_cost_based_join_reorder=false, enable_projection=true) */
+ sum(l_extendedprice* (1 - l_discount)) as revenue
+from
+ lineitem,
+ part
+where
+ (
+ p_partkey = l_partkey
+ and p_brand = 'Brand#12'
+ and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
+ and l_quantity >= 1 and l_quantity <= 1 + 10
+ and p_size between 1 and 5
+ and l_shipmode in ('AIR', 'AIR REG')
+ and l_shipinstruct = 'DELIVER IN PERSON'
+ )
+ or
+ (
+ p_partkey = l_partkey
+ and p_brand = 'Brand#23'
+ and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
+ and l_quantity >= 10 and l_quantity <= 10 + 10
+ and p_size between 1 and 10
+ and l_shipmode in ('AIR', 'AIR REG')
+ and l_shipinstruct = 'DELIVER IN PERSON'
+ )
+ or
+ (
+ p_partkey = l_partkey
+ and p_brand = 'Brand#34'
+ and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
+ and l_quantity >= 20 and l_quantity <= 20 + 10
+ and p_size between 1 and 15
+ and l_shipmode in ('AIR', 'AIR REG')
+ and l_shipinstruct = 'DELIVER IN PERSON'
+ );
+
+--Q20
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=2, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true,
runtime_bloom_filter_size=551943) */
+s_name, s_address from
+supplier left semi join
+(
+ select * from
+ (
+ select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q
+ from lineitem
+ where l_shipdate >= date '1994-01-01'
+ and l_shipdate < date '1994-01-01' + interval '1' year
+ group by l_partkey,l_suppkey
+ ) t2 join
+ (
+ select ps_partkey, ps_suppkey, ps_availqty
+ from partsupp left semi join part
+ on ps_partkey = p_partkey and p_name like 'forest%'
+ ) t1
+ on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey
+ and t1.ps_availqty > t2.l_q
+) t3
+on s_suppkey = t3.ps_suppkey
+join nation
+where s_nationkey = n_nationkey
+ and n_name = 'CANADA'
+order by s_name;
+
+--Q21
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4, enable_vectorized_engine=true,
batch_size=4096, disable_join_reorder=true,
enable_cost_based_join_reorder=true, enable_projection=true) */
+s_name, count(*) as numwait
+from
+ lineitem l2 right semi join
+ (
+ select * from
+ lineitem l3 right anti join
+ (
+ select * from
+ orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus
= 'F'
+ join
+ (
+ select * from
+ supplier join nation
+ where s_nationkey = n_nationkey
+ and n_name = 'SAUDI ARABIA'
+ ) t1
+ where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate
+ ) t2
+ on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and
l3.l_receiptdate > l3.l_commitdate
+ ) t3
+ on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey
+
+group by
+ t3.s_name
+order by
+ numwait desc,
+ t3.s_name
+limit 100;
+
+--Q21
+
+with tmp as (select
+ avg(c_acctbal) as av
+ from
+ customer
+ where
+ c_acctbal > 0.00
+ and substring(c_phone, 1, 2) in
+ ('13', '31', '23', '29', '30', '18', '17'))
+
+select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4,runtime_bloom_filter_size=4194304) */
+ cntrycode,
+ count(*) as numcust,
+ sum(c_acctbal) as totacctbal
+from
+ (
+ select
+ substring(c_phone, 1, 2) as cntrycode,
+ c_acctbal
+ from
+ orders right anti join customer c on o_custkey = c.c_custkey
join tmp on c.c_acctbal > tmp.av
+ where
+ substring(c_phone, 1, 2) in
+ ('13', '31', '23', '29', '30', '18', '17')
+ ) as custsale
+group by
+ cntrycode
+order by
+ cntrycode;
+
+```
+
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]