Zoltán Borók-Nagy created IMPALA-11147:
------------------------------------------

             Summary: Min/max filtering crashes on Parquet file that contains 
partition columns
                 Key: IMPALA-11147
                 URL: https://issues.apache.org/jira/browse/IMPALA-11147
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Zoltán Borók-Nagy


Impala can crash on a Parquet file that contains the partition columns.

Data files usually don't contain the partition columns, so Impala don't expect 
to find such columns in the data files. Unfortunately min/max filtering 
generates a SEGFAULT when the partition column is present in the data files.

It happens because FindSkipRangesForPagesWithMinMaxFilters() tries to retrieve 
the Parquet schema element for a given slot descriptor. When the slot 
descriptor refers to a partition column, we usually don't find a schema element 
so we don't try to skip pages.

But when the partition column is present in the data file, the code tries to 
calculate the filtered pages in the column. It uses the column reader object 
corresponding to the column, but this is null for partition columns, hence we 
get a SEGFAULT.

The code shouldn't do anything at the page-level for partition columns, as the 
data in such columns are the same for the whole file and it is already filtered 
at a higher level.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to