Fang-Yu Rao created IMPALA-14116:
------------------------------------

             Summary: Consider erroring out earlier if NULL is on the IN-list 
of a table scan against an ORC table
                 Key: IMPALA-14116
                 URL: https://issues.apache.org/jira/browse/IMPALA-14116
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Fang-Yu Rao
            Assignee: Fang-Yu Rao
         Attachments: resolved_crashed_thread.txt

We found that currently if we include NULL on the IN-list of a table scan 
against an ORC table, Impala daemons could crash. This could be reproduced in 
the following.

# Create the database and an ORC table under the database in impala-shell.
{code}
create database test_db_04;

CREATE EXTERNAL TABLE test_db_04.test_tbl_01 (customer_id STRING) 
PARTITIONED BY (ingest_date STRING) 
WITH SERDEPROPERTIES ('serialization.format'='1') 
STORED AS ORC;
{code}
# Insert a row into the ORC table just created via beeline.
{code}
INSERT INTO test_db_04.test_tbl_01 partition (ingest_date='2025-05-29') values 
('CUST001');
{code}
# Execute the following query via impala-shell.
{code}
SELECT ingest_date, customer_id
FROM test_db_04.test_tbl_01 WHERE ingest_date > DATE '2024-09-30' AND 
customer_id IN ('', NULL)
GROUP BY 1, 2;
{code}

An Impala daemon would crash during the execution of the ORC table scan. The 
stack trace of the crashed thread in the resolved minidump is also provided in  
[^resolved_crashed_thread.txt].

We should consider erroring out earlier if NULL is on the IN-list of a table 
scan against an ORC table to prevent any Impala daemon from crashing, maybe 
during the query analysis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to