Github user blrunner commented on the pull request:
https://github.com/apache/tajo/pull/994#issuecomment-205623205
Here is my benchmark results as follows.
# Configuration
* EC2 instance type : c3.xlarge
* Cluster: 1 master, 3 worker
* Dataset: TPC-H (factor = 1)
* Partition table schema
```
CREATE EXTERNAL TABLE lineitem_p (l_orderkey INT8, l_partkey INT8,
l_suppkey INT8, l_linenumber INT8, l_quantity FLOAT8, l_extendedprice FLOAT8,
l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, l_linestatus TEXT,
l_commitdate text, l_receiptdate text, l_shipinstruct TEXT, l_shipmode TEXT,
l_comment TEXT)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN(l_shipdate text)
LOCATION 's3://Xyz';
```
* Partition numbers of ``lineitem`` table: 2526 (each partitions includes
just one file)
# Queries
* Q1: `` select * from lineitem_p limit 5; ``
* Q2: `` select count(*) from lineitem_p; ``
* Q3: `` select count(*) from lineitem_p where l_shipdate > '1994-09-25'
and l_shipdate < '1994-10-10'; ``
# Query Execution Time
Query | No Optimized | Optimized | Improvement
-------------------|----------------------|--------------------------|-------------------
Q1 | 573.425 sec | 4.228 sec | 135.6x
Q2 | 653.175 sec | 33.444 sec | 19.5x
Q3 | 4.099 sec | 2.429 sec | 1.6x
# Split Computation Time
Query | No Optimized | Optimized | Improvement
-------------------|----------------------|--------------------------|-------------------
Q1 | 572921 ms | 2233 ms | 256.5x
Q2 | 599437 ms | 701 ms | 855.1x
Q3 | 2537 ms | 388 ms | 6.5x
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---