[
https://issues.apache.org/jira/browse/IMPALA-11147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-11147.
----------------------------------------
Fix Version/s: Impala 4.1.0
Resolution: Fixed
> Min/max filtering crashes on Parquet file that contains partition columns
> -------------------------------------------------------------------------
>
> Key: IMPALA-11147
> URL: https://issues.apache.org/jira/browse/IMPALA-11147
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Fix For: Impala 4.1.0
>
>
> Impala can crash on a Parquet file that contains the partition columns.
> Data files usually don't contain the partition columns, so Impala don't
> expect to find such columns in the data files. Unfortunately min/max
> filtering generates a SEGFAULT when the partition column is present in the
> data files.
> It happens because FindSkipRangesForPagesWithMinMaxFilters() tries to
> retrieve the Parquet schema element for a given slot descriptor. When the
> slot descriptor refers to a partition column, we usually don't find a schema
> element so we don't try to skip pages.
> But when the partition column is present in the data file, the code tries to
> calculate the filtered pages in the column. It uses the column reader object
> corresponding to the column, but this is null for partition columns, hence we
> get a SEGFAULT.
> The code shouldn't do anything at the page-level for partition columns, as
> the data in such columns are the same for the whole file and it is already
> filtered at a higher level.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)