Cheng Lian created PARQUET-389:
----------------------------------

             Summary: Filter predicates should work with missing columns
                 Key: PARQUET-389
                 URL: https://issues.apache.org/jira/browse/PARQUET-389
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.8.0, 1.7.0, 1.6.0
            Reporter: Cheng Lian


This issue originates from SPARK-11103, which contains detailed information 
about how to reproduce it.

The major problem here is that, filter predicates pushed down assert that 
columns they touch must exist in the target physical files. But this isn't true 
in case of schema merging.

Actually this assertion is unnecessary, because if a column is missing in the 
filter schema, the column is considered to be filled by nulls, and all the 
filters should be able to act accordingly. For example, if we push down {{a = 
1}} but {{a}} is missing in the underlying physical file, all records in this 
file should be dropped since {{a}} is always null. On the other hand, if we 
push down {{a IS NULL}}, all records should be preserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to