Joe McDonnell created IMPALA-12363:
--------------------------------------
Summary: Upgrade re2 to version 2023-03-01 or higher
Key: IMPALA-12363
URL: https://issues.apache.org/jira/browse/IMPALA-12363
Project: IMPALA
Issue Type: Improvement
Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell
There has been a lot of development on google's re2 since the version that we
currently use (20190301). In a prototype using version 2023-03-01, it seems to
help TPC-H Q13, which has a "o_comment not like '%special%requests%'" predicate:
{noformat}
(I) Improvement: TPCH(42) TPCH-Q13 [parquet / none / none] (5.26s -> 4.77s
[-9.43%])
+---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) |
StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est
#Rows |
+---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
| 03:AGGREGATE | 8.84% | 478.98ms | 503.19ms | -4.81% | 1.74%
| 642.76ms | 695.25ms | -7.55% | 3 | 15 | 6.30M | 6.22M |
| 02:HASH JOIN | 9.35% | 506.60ms | 532.76ms | -4.91% | 1.49%
| 664.59ms | 738.50ms | -10.01% | 3 | 15 | 64.42M | 6.38M |
| F00:EXCHANGE SENDER | 38.39% | 2.08s | 1.99s | +4.49% | 0.87%
| 2.39s | 2.28s | +4.77% | 3 | 15 | -1 | -1 |
| 01:SCAN HDFS | 38.93% | 2.11s | 2.64s | -20.17% | 0.88%
| 2.37s | 2.99s | -20.87% | 3 | 15 | 62.32M | 6.30M |
+---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
{noformat}
This is with
mt_dop=5,runtime_filter_min_size=8192,runtime_filter_max_size=2097152,max_num_runtime_filters=50,runtime_filter_wait_time_ms=10000
.
Beyond 2023-03-01, re2 takes an Abseil dependency. It may have further
improvements (they replace some std::unordered_map structures with Abseil's
hash table). We can look into those versions, but it is a little bit more work
compared to 2023-03-01.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)