[
https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-13193.
-------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
Resolving this. Thank [~tangzhi]!
> RuntimeFilter on parquet dictionary should evaluate null values
> ---------------------------------------------------------------
>
> Key: IMPALA-13193
> URL: https://issues.apache.org/jira/browse/IMPALA-13193
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2,
> Impala 4.3.0, Impala 4.4.0
> Reporter: Quanlong Huang
> Assignee: Zhi Tang
> Priority: Critical
> Labels: correctness
> Fix For: Impala 4.5.0
>
>
> IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime
> filter on parquet dictionary values. If non of the values can pass the check,
> the whole row group will be skipped. However, NULL values are not included in
> the parquet dictionary. Runtime filters that accept NULL values might
> incorrectly reject the row group if none of the dictionary values can pass
> the check.
> Here are steps to reproduce the bug:
> {code:sql}
> create table parq_tbl (id bigint, name string) stored as parquet;
> insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc");
> create table dim_tbl (name string);
> insert into dim_tbl values (NULL);
> select * from parq_tbl p join dim_tbl d
> on COALESCE(p.name, '') = COALESCE(d.name, '');{code}
> The SELECT query should return 2 rows but now it returns 0 rows.
> A workaround is to disable this optimization:
> {code:sql}
> set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)