[ 
https://issues.apache.org/jira/browse/IMPALA-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609491#comment-16609491
 ] 

Michal Ostrowski commented on IMPALA-6661:
------------------------------------------

For consistency sake, I think we have to have Nan != Nan.

[https://github.com/apache/impala/blob/master/docs/shared/impala_common.xml#L2714-L2718]

This is turn means that we cannot collapse this group count since that would 
make the count/group by query inconsistent with a non-aggregating query.

 

> Group by float results in one group per NaN value
> -------------------------------------------------
>
>                 Key: IMPALA-6661
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6661
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0, Impala 2.12.0
>            Reporter: Tim Armstrong
>            Assignee: Michal Ostrowski
>            Priority: Major
>              Labels: correctness, perf, ramp-up
>
> I don't know if this is the desired behaviour but it could be problematic for 
> some users since it will blow up the number of distinct groups in an 
> aggregation. I suspect that it's more useful to coalesce all the NaNs into a 
> single group, similar to how NULL is handled in GROUP BY.
> {noformat}
> [localhost:21000] > select distinct * from (values(cast("nan" as float)), 
> (cast("nan" as float)), (sqrt(cast("-1" as float)))) v;
> +----------------------+
> | cast('nan' as float) |
> +----------------------+
> | NaN                  |
> | NaN                  |
> | NaN                  |
> +----------------------+
> Fetched 3 row(s) in 0.11s
> {noformat}
> I suspect IMPALA-6069 slightly changed the behaviour here, although it would 
> have been broken beforehand anyway, since not all NaNs have the same bit 
> pattern, so Equals() and Hash() were inconsistent.
> We should decided what the preferred behaviour is and tweak the behaviour of 
> the hash table to produce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to