[jira] [Updated] (SPARK-9079) Design NaN semantics

Reynold Xin (JIRA) Fri, 17 Jul 2015 17:15:45 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Reynold Xin updated SPARK-9079:
-------------------------------
    Description: 
1. What should NaN = NaN return?

NaN = NaN should return true.

2. If we see NaN in the group by key column, should we group NaN values into 
one group, or into different groups?

All NaN values should be grouped together.

3. What about NaN in join keys?

NaN should be treated as a normal value in join keys.

4. When aggregating over columns containing NaN, should the result be NaN, or 
should the result exclude NaN values (treating them like nulls)?

This is TO BE DECIDED. By default, the behavior is to return NaN.


5. Where should NaN go in sorting?

NaN should go last when in ascending order, larger than any other numeric value.


Note that 5 is much more important than the other 4 since right now the sorter 
throws exceptions on NaN values. See SPARK-8797.


  was:
1. What should NaN = NaN return?

NaN = NaN should return true.

2. If we see NaN in the group by key column, should we group NaN values into 
one group, or into different groups?

All NaN values should be grouped together.

3. What about NaN in join keys?

NaN should be treated as a normal value in join keys.

4. When aggregating over columns containing NaN, should the result be NaN, or 
should the result exclude NaN values (treating them like nulls)?

5. Where should NaN go in sorting?

NaN should go last when in ascending order, larger than any other numeric value.


Note that 5 is much more important than the other 4 since right now the sorter 
throws exceptions on NaN values. See SPARK-8797.



> Design NaN semantics
> --------------------
>
>                 Key: SPARK-9079
>                 URL: https://issues.apache.org/jira/browse/SPARK-9079
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>             Fix For: 1.5.0
>
>
> 1. What should NaN = NaN return?
> NaN = NaN should return true.
> 2. If we see NaN in the group by key column, should we group NaN values into 
> one group, or into different groups?
> All NaN values should be grouped together.
> 3. What about NaN in join keys?
> NaN should be treated as a normal value in join keys.
> 4. When aggregating over columns containing NaN, should the result be NaN, or 
> should the result exclude NaN values (treating them like nulls)?
> This is TO BE DECIDED. By default, the behavior is to return NaN.
> 5. Where should NaN go in sorting?
> NaN should go last when in ascending order, larger than any other numeric 
> value.
> Note that 5 is much more important than the other 4 since right now the 
> sorter throws exceptions on NaN values. See SPARK-8797.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-9079) Design NaN semantics

Reply via email to