[ 
https://issues.apache.org/jira/browse/IMPALA-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033776#comment-17033776
 ] 

ASF subversion and git services commented on IMPALA-8110:
---------------------------------------------------------

Commit ebc2c366f5780a89f09eb2014ca94f9d970f50b4 in impala's branch 
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ebc2c36 ]

IMPALA-8110: Fix Parquet min/max filters for narrowed integer types

This patch adds validation for the paired stats values of tinyint
and smallint column data type when reading min/max column stats
value from Parquet file.

Testing:
 - Added automatic test cases in parquet-stats.test for column data
   type been changed from int to tinyint, from smallint to tinyint
   and from int to smallint.
 - Passed EE tests.
 - Passed all core tests.

Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Reviewed-on: http://gerrit.cloudera.org:8080/15087
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Parquet stat filtering does not handle narrowed int types correctly
> -------------------------------------------------------------------
>
>                 Key: IMPALA-8110
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8110
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Wenzhe Zhou
>            Priority: Critical
>              Labels: correctness, parquet
>
> Impala can read int32 Parquet columns as tiny/smallint SQL columns. If the 
> value does not fit into the 8/16 bit signed int's range, the value will 
> overflow, e.g writing 128 as int32 and then rereading it as int8 will return 
> -128. This is normal as far as I understand, but min/max stat filtering does 
> not handle this case correctly:
> create table tnarrow (i int) stored as parquet;
> insert into tnarrow values (1), (201); 
> alter table tnarrow change column i i tinyint;
> set PARQUET_READ_STATISTICS=0;
> select * from tnarrow where i < 0;
> -> returns 1 row: -56
> set PARQUET_READ_STATISTICS=1;
> -> returns 0 row



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to