Csaba Ringhofer created IMPALA-8110:
---------------------------------------
Summary: Parquet stat filtering does not handle narrowed int types
correctly
Key: IMPALA-8110
URL: https://issues.apache.org/jira/browse/IMPALA-8110
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Csaba Ringhofer
Impala can read int32 Parquet columns as tiny/smallint SQL columns. If the
value does not fit into the 8/16 bit signed int's range, the value will
overflow, e.g writing 128 as int32 and then rereading it as int8 will return
-128. This is normal as far as I understand, but min/max stat filtering does
not handle this case correctly:
create table tnarrow (i int) stored as parquet;
insert into tnarrow values (1), (201);
alter table tnarrow change column i i tinyint;
set PARQUET_READ_STATISTICS=0;
select * from tnarrow where i < 0;
-> returns 1 row: -56
set PARQUET_READ_STATISTICS=1;
-> returns 0 row
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]