[jira] [Updated] (SPARK-9079) Design NaN semantics

Reynold Xin (JIRA) Wed, 15 Jul 2015 15:08:23 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-9079:
-------------------------------
    Description: 
1. What should NaN = NaN return?

2. If we see NaN in the group by key column, should we group NaN values into 
one group, or into different groups?

3. What about NaN in join keys?

4. When aggregating over columns containing NaN, should the result be NaN, or 
should the result exclude NaN values (treating them like nulls)?

5. Where should NaN go in sorting?

Note that 5 is much more important than the other 4 since right now the sorter 
throws exceptions on NaN values. See SPARK-8797.


> Design NaN semantics
> --------------------
>
>                 Key: SPARK-9079
>                 URL: https://issues.apache.org/jira/browse/SPARK-9079
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>
> 1. What should NaN = NaN return?
> 2. If we see NaN in the group by key column, should we group NaN values into 
> one group, or into different groups?
> 3. What about NaN in join keys?
> 4. When aggregating over columns containing NaN, should the result be NaN, or 
> should the result exclude NaN values (treating them like nulls)?
> 5. Where should NaN go in sorting?
> Note that 5 is much more important than the other 4 since right now the 
> sorter throws exceptions on NaN values. See SPARK-8797.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-9079) Design NaN semantics

Reply via email to