[jira] [Resolved] (PARQUET-389) Filter predicates should work with missing columns

Ryan Blue (JIRA) Fri, 15 Jul 2016 09:55:32 -0700

     [ 
https://issues.apache.org/jira/browse/PARQUET-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ryan Blue resolved PARQUET-389.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.9.0

Merged #359. Thanks [~lian cheng] and [~dweeks-netflix] for reviewing!

> Filter predicates should work with missing columns
> --------------------------------------------------
>
>                 Key: PARQUET-389
>                 URL: https://issues.apache.org/jira/browse/PARQUET-389
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.6.0, 1.7.0, 1.8.0
>            Reporter: Cheng Lian
>            Assignee: Ryan Blue
>             Fix For: 1.9.0
>
>
> This issue originates from SPARK-11103, which contains detailed information 
> about how to reproduce it.
> The major problem here is that, filter predicates pushed down assert that 
> columns they touch must exist in the target physical files. But this isn't 
> true in case of schema merging.
> Actually this assertion is unnecessary, because if a column is missing in the 
> filter schema, the column is considered to be filled by nulls, and all the 
> filters should be able to act accordingly. For example, if we push down {{a = 
> 1}} but {{a}} is missing in the underlying physical file, all records in this 
> file should be dropped since {{a}} is always null. On the other hand, if we 
> push down {{a IS NULL}}, all records should be preserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (PARQUET-389) Filter predicates should work with missing columns

Reply via email to