[ https://issues.apache.org/jira/browse/IMPALA-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673257#comment-16673257 ]
ASF subversion and git services commented on IMPALA-6661: --------------------------------------------------------- Commit 15d48c3205778ce775270feac10186e8e4851d7c in impala's branch refs/heads/master from [~mostrows] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=15d48c3 ] IMPALA-6661 Make NaN values equal for grouping purposes. Similar to the treatment of NULLs, we want to consider NaN values as equal when grouping. - When detecting a NaN in a set of row values, the NaN value must be converted to a canonical value - so that all NaN values have the same bit-pattern for hashing purposes. - When doing equality evaluation, floating point types must have additional logic to consider NaN values as equal. - Existing logic for handling NULLs in this way is appropriate for triggering this behavior for NaN values. - Relabel "force null equality" as "inclusive equality" to expand the scope of the concept to a more generic form that includes NaN. Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Reviewed-on: http://gerrit.cloudera.org:8080/11535 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Michael Ho <k...@cloudera.com> > Group by float results in one group per NaN value > ------------------------------------------------- > > Key: IMPALA-6661 > URL: https://issues.apache.org/jira/browse/IMPALA-6661 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.11.0, Impala 2.12.0 > Reporter: Tim Armstrong > Assignee: Michal Ostrowski > Priority: Major > Labels: correctness, perf, ramp-up > Fix For: Impala 3.1.0 > > > I don't know if this is the desired behaviour but it could be problematic for > some users since it will blow up the number of distinct groups in an > aggregation. I suspect that it's more useful to coalesce all the NaNs into a > single group, similar to how NULL is handled in GROUP BY. > {noformat} > [localhost:21000] > select distinct * from (values(cast("nan" as float)), > (cast("nan" as float)), (sqrt(cast("-1" as float)))) v; > +----------------------+ > | cast('nan' as float) | > +----------------------+ > | NaN | > | NaN | > | NaN | > +----------------------+ > Fetched 3 row(s) in 0.11s > {noformat} > I suspect IMPALA-6069 slightly changed the behaviour here, although it would > have been broken beforehand anyway, since not all NaNs have the same bit > pattern, so Equals() and Hash() were inconsistent. > We should decided what the preferred behaviour is and tweak the behaviour of > the hash table to produce it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org