[
https://issues.apache.org/jira/browse/IMPALA-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michal Ostrowski resolved IMPALA-6661.
--------------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
Fixed with 15d48c3205778ce775270feac10186e8e4851d7c
> Group by float results in one group per NaN value
> -------------------------------------------------
>
> Key: IMPALA-6661
> URL: https://issues.apache.org/jira/browse/IMPALA-6661
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.11.0, Impala 2.12.0
> Reporter: Tim Armstrong
> Assignee: Michal Ostrowski
> Priority: Major
> Labels: correctness, perf, ramp-up
> Fix For: Impala 3.1.0
>
>
> I don't know if this is the desired behaviour but it could be problematic for
> some users since it will blow up the number of distinct groups in an
> aggregation. I suspect that it's more useful to coalesce all the NaNs into a
> single group, similar to how NULL is handled in GROUP BY.
> {noformat}
> [localhost:21000] > select distinct * from (values(cast("nan" as float)),
> (cast("nan" as float)), (sqrt(cast("-1" as float)))) v;
> +----------------------+
> | cast('nan' as float) |
> +----------------------+
> | NaN |
> | NaN |
> | NaN |
> +----------------------+
> Fetched 3 row(s) in 0.11s
> {noformat}
> I suspect IMPALA-6069 slightly changed the behaviour here, although it would
> have been broken beforehand anyway, since not all NaNs have the same bit
> pattern, so Equals() and Hash() were inconsistent.
> We should decided what the preferred behaviour is and tweak the behaviour of
> the hash table to produce it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]