[ 
https://issues.apache.org/jira/browse/IMPALA-13193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zihao Ye reassigned IMPALA-13193:
---------------------------------

    Assignee: Zhi Tang

> RuntimeFilter on parquet dictionary should evaluate null values
> ---------------------------------------------------------------
>
>                 Key: IMPALA-13193
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13193
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, Impala 4.1.2, 
> Impala 4.3.0, Impala 4.4.0
>            Reporter: Quanlong Huang
>            Assignee: Zhi Tang
>            Priority: Critical
>
> IMPALA-10910, IMPALA-5509 introduces an optimization to evaluate runtime 
> filter on parquet dictionary values. If non of the values can pass the check, 
> the whole row group will be skipped. However, NULL values are not included in 
> the parquet dictionary. Runtime filters that accept NULL values might 
> incorrectly reject the row group if none of the dictionary values can pass 
> the check.
> Here are steps to reproduce the bug:
> {code:sql}
> create table parq_tbl (id bigint, name string) stored as parquet;
> insert into parq_tbl values (0, "abc"), (1, NULL), (2, NULL), (3, "abc");
> create table dim_tbl (name string);
> insert into dim_tbl values (NULL);
> select * from parq_tbl p join dim_tbl d
>   on COALESCE(p.name, '') = COALESCE(d.name, '');{code}
> The SELECT query should return 2 rows but now it returns 0 rows.
> A workaround is to disable this optimization:
> {code:sql}
> set PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT=0;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to