[
https://issues.apache.org/jira/browse/PARQUET-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved PARQUET-389.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.9.0
Merged #359. Thanks [~lian cheng] and [~dweeks-netflix] for reviewing!
> Filter predicates should work with missing columns
> --------------------------------------------------
>
> Key: PARQUET-389
> URL: https://issues.apache.org/jira/browse/PARQUET-389
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.6.0, 1.7.0, 1.8.0
> Reporter: Cheng Lian
> Assignee: Ryan Blue
> Fix For: 1.9.0
>
>
> This issue originates from SPARK-11103, which contains detailed information
> about how to reproduce it.
> The major problem here is that, filter predicates pushed down assert that
> columns they touch must exist in the target physical files. But this isn't
> true in case of schema merging.
> Actually this assertion is unnecessary, because if a column is missing in the
> filter schema, the column is considered to be filled by nulls, and all the
> filters should be able to act accordingly. For example, if we push down {{a =
> 1}} but {{a}} is missing in the underlying physical file, all records in this
> file should be dropped since {{a}} is always null. On the other hand, if we
> push down {{a IS NULL}}, all records should be preserved.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)