[
https://issues.apache.org/jira/browse/SPARK-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-9079:
-------------------------------
Description:
1. What should NaN = NaN return?
2. If we see NaN in the group by key column, should we group NaN values into
one group, or into different groups?
3. What about NaN in join keys?
4. When aggregating over columns containing NaN, should the result be NaN, or
should the result exclude NaN values (treating them like nulls)?
5. Where should NaN go in sorting?
Note that 5 is much more important than the other 4 since right now the sorter
throws exceptions on NaN values. See SPARK-8797.
> Design NaN semantics
> --------------------
>
> Key: SPARK-9079
> URL: https://issues.apache.org/jira/browse/SPARK-9079
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Reynold Xin
>
> 1. What should NaN = NaN return?
> 2. If we see NaN in the group by key column, should we group NaN values into
> one group, or into different groups?
> 3. What about NaN in join keys?
> 4. When aggregating over columns containing NaN, should the result be NaN, or
> should the result exclude NaN values (treating them like nulls)?
> 5. Where should NaN go in sorting?
> Note that 5 is much more important than the other 4 since right now the
> sorter throws exceptions on NaN values. See SPARK-8797.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]