Joe McDonnell created IMPALA-12363:
--------------------------------------

             Summary: Upgrade re2 to version 2023-03-01 or higher
                 Key: IMPALA-12363
                 URL: https://issues.apache.org/jira/browse/IMPALA-12363
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 4.3.0
            Reporter: Joe McDonnell


There has been a lot of development on google's re2 since the version that we 
currently use (20190301). In a prototype using version 2023-03-01, it seems to 
help TPC-H Q13, which has a "o_comment not like '%special%requests%'" predicate:
{noformat}
(I) Improvement: TPCH(42) TPCH-Q13 [parquet / none / none] (5.26s -> 4.77s 
[-9.43%])
+---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
| Operator            | % of Query | Avg      | Base Avg | Delta(Avg) | 
StdDev(%) | Max      | Base Max | Delta(Max) | #Hosts | #Inst | #Rows  | Est 
#Rows |
+---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
| 03:AGGREGATE        | 8.84%      | 478.98ms | 503.19ms | -4.81%     |   1.74% 
  | 642.76ms | 695.25ms | -7.55%     | 3      | 15    | 6.30M  | 6.22M     |
| 02:HASH JOIN        | 9.35%      | 506.60ms | 532.76ms | -4.91%     |   1.49% 
  | 664.59ms | 738.50ms | -10.01%    | 3      | 15    | 64.42M | 6.38M     |
| F00:EXCHANGE SENDER | 38.39%     | 2.08s    | 1.99s    | +4.49%     |   0.87% 
  | 2.39s    | 2.28s    | +4.77%     | 3      | 15    | -1     | -1        |
| 01:SCAN HDFS        | 38.93%     | 2.11s    | 2.64s    | -20.17%    |   0.88% 
  | 2.37s    | 2.99s    | -20.87%    | 3      | 15    | 62.32M | 6.30M     |
+---------------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
{noformat}
This is with 
mt_dop=5,runtime_filter_min_size=8192,runtime_filter_max_size=2097152,max_num_runtime_filters=50,runtime_filter_wait_time_ms=10000
 . 

Beyond 2023-03-01, re2 takes an Abseil dependency. It may have further 
improvements (they replace some std::unordered_map structures with Abseil's 
hash table). We can look into those versions, but it is a little bit more work 
compared to 2023-03-01.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to