[
https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zihao Ye reassigned IMPALA-13193:
---------------------------------
Assignee: Zhi Tang
> RuntimeFilter on parquet dictionary should evaluate null values
> ---------------------------------------------------------------
>
> Key: IMPALA-13193
> URL: https://issues.apache.org/jira/browse/IMPALA-13193
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2,
> Impala 4.3.0, Impala 4.4.0
> Reporter: Quanlong Huang
> Assignee: Zhi Tang
> Priority: Critical
>
> IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime
> filter on parquet dictionary values. If non of the values can pass the check,
> the whole row group will be skipped. However, NULL values are not included in
> the parquet dictionary. Runtime filters that accept NULL values might
> incorrectly reject the row group if none of the dictionary values can pass
> the check.
> Here are steps to reproduce the bug:
> {code:sql}
> create table parq_tbl (id bigint, name string) stored as parquet;
> insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc");
> create table dim_tbl (name string);
> insert into dim_tbl values (NULL);
> select * from parq_tbl p join dim_tbl d
> on COALESCE(p.name, '') = COALESCE(d.name, '');{code}
> The SELECT query should return 2 rows but now it returns 0 rows.
> A workaround is to disable this optimization:
> {code:sql}
> set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]