[ 
https://issues.apache.org/jira/browse/IMPALA-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-6538.
---------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0
                   Impala 3.0

Fixed by 
https://github.com/apache/impala/commit/881e00a8bff0469ab7860bcd0d4d4794fb04a4b8

> Fix read path when Parquet min(_value)/max(_value) statistics contain NaN
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-6538
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6538
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>             Fix For: Impala 3.0, Impala 2.12.0
>
>
> (I'll only write min and max, but I'll also mean min_value and max_value by 
> that)
> When both min and max is NaN:
>  * Written by Impala:
>  ** first element in the row group is NaN, but not all of them (Impala writer 
> bug)
>  ** all element is NaN
>  * Written by Hive/Parquet-mr:
>  ** all element is NaN
> Either min or max is NaN, but not both:
>  * Written by Impala:
>  ** this cannot happen currently
>  * Written by Hive/Parquet-mr:
>  ** only the max can be NaN (needs to be checked)
> Therefore, if both min and max is NaN, we can't use the statistics for 
> filtering.
> If only the max is NaN, we still have a valid lower bound.
>  
> A workaround can be to change the NaNs to infinities, ie. max => Inf, min => 
> -Inf
> Based on my experiments, min/max statistics are not applied to predicates 
> that can be true for NaN, e.g. 'NOT x < 3'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to