[ 
https://issues.apache.org/jira/browse/IMPALA-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-11147.
----------------------------------------
    Fix Version/s: Impala 4.1.0
       Resolution: Fixed

> Min/max filtering crashes on Parquet file that contains partition columns
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-11147
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11147
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>             Fix For: Impala 4.1.0
>
>
> Impala can crash on a Parquet file that contains the partition columns.
> Data files usually don't contain the partition columns, so Impala don't 
> expect to find such columns in the data files. Unfortunately min/max 
> filtering generates a SEGFAULT when the partition column is present in the 
> data files.
> It happens because FindSkipRangesForPagesWithMinMaxFilters() tries to 
> retrieve the Parquet schema element for a given slot descriptor. When the 
> slot descriptor refers to a partition column, we usually don't find a schema 
> element so we don't try to skip pages.
> But when the partition column is present in the data file, the code tries to 
> calculate the filtered pages in the column. It uses the column reader object 
> corresponding to the column, but this is null for partition columns, hence we 
> get a SEGFAULT.
> The code shouldn't do anything at the page-level for partition columns, as 
> the data in such columns are the same for the whole file and it is already 
> filtered at a higher level.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to