[ 
https://issues.apache.org/jira/browse/SPARK-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012335#comment-15012335
 ] 

Yin Huai commented on SPARK-11452:
----------------------------------

I think you hit https://issues.apache.org/jira/browse/SPARK-11009. It has been 
fixed in Spark 1.5.2.

> Window functions give invalid values
> ------------------------------------
>
>                 Key: SPARK-11452
>                 URL: https://issues.apache.org/jira/browse/SPARK-11452
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: yarn-client on Amazon EMR
>            Reporter: Jarno Seppanen
>             Fix For: 1.5.2
>
>
> I'm getting incorrect results from several window functions:
> * negative values from ROW_NUMBER()
> * incorrect floats from SUM(...) OVER (...)
> * huge integers from COUNT(...) OVER (...)
> Here's an example pyspark session that shows the incorrect behaviors:
> {code}
> >>> logout_acc.printSchema()
> root
>  |-- account: long (nullable = true)
>  |-- avatar: long (nullable = true)
>  |-- timestamp: timestamp (nullable = true)
>  |-- session_dur: double (nullable = true)
> >>> session_numbers = sqlContext.sql("""
> ...   SELECT account, avatar, timestamp,
> ...     COUNT(*) OVER (
> ...       PARTITION BY account ORDER BY timestamp) AS session,
> ...     ROW_NUMBER() OVER (
> ...       PARTITION BY account ORDER BY timestamp) AS row_num,
> ...     SUM(session_dur) OVER (
> ...       PARTITION BY account ORDER BY timestamp) AS playtime
> ...   FROM logout_acc
> ... """)
> >>> session_numbers.cache().registerTempTable('session_numbers')
> >>> session_numbers.show()
> +-------+------+--------------------+-------+-----------+--------------+
> |account|avatar|           timestamp|session|    row_num|      playtime|
> +-------+------+--------------------+-------+-----------+--------------+
> |   ...1|  ...1|2015-09-01 00:04:...|      1| 1261342896| 4.991264E-317|
> |   ...2|  ...2|2015-09-01 00:40:...|      1| -900190400| 4.991363E-317|
> |   ...2|  ...2|2015-09-01 00:43:...|      2| -695150400| 4.991363E-317|
> |   ...2|  ...2|2015-09-01 00:50:...|      3| -278701400| 4.991363E-317|
> |   ...2|  ...2|2015-09-01 01:25:...|      4| 1824111600| 4.991363E-317|
> |   ...2|  ...2|2015-09-01 01:28:...|      5| 1992217600| 4.991363E-317|
> |   ...2|  ...2|2015-09-01 01:31:...|      6|-2127806696| 4.991363E-317|
> |   ...2|  ...2|2015-09-03 00:47:...|      7|  574647760| 4.991363E-317|
> |   ...3|  ...3|2015-09-01 01:16:...|      1| 1269805600|4.9914617E-317|
> |   ...3|  ...3|2015-09-01 01:22:...|      2| 1621270600|4.9914617E-317|
> |   ...3|  ...3|2015-09-01 14:52:...|      3|-1269231952|4.9914617E-317|
> |   ...3|  ...3|2015-09-01 14:59:...|      4| -865949952|4.9914617E-317|
> |   ...3|  ...3|2015-09-04 05:05:...|      5| -678665344|4.9914617E-317|
> |   ...3|  ...3|2015-09-05 16:19:...|      6| 1645213072|4.9914617E-317|
> |   ...3|  ...3|2015-09-05 17:09:...|      7|  339651776|4.9914617E-317|
> |   ...3|  ...3|2015-09-05 20:27:...|      8| -669300112|4.9914617E-317|
> |   ...3|  ...3|2015-09-05 21:12:...|      9| 2032531888|4.9914617E-317|
> |   ...3|  ...3|2015-09-09 02:45:...|     10| 2046885648|4.9914617E-317|
> |   ...3|  ...3|2015-09-09 03:47:...|     11| 1493942352|4.9914617E-317|
> |   ...3|  ...3|2015-09-09 21:09:...|     12| -434263088|4.9914617E-317|
> +-------+------+--------------------+-------+-----------+--------------+
> >>> sqlContext.sql('select max(session) from session_numbers').show()
> +-------------------+                                                         
>   
> |                _c0|
> +-------------------+
> |4687099421149855744|
> +-------------------+
> {code}
> In fact, this was a transient problem, because after I re-ran the above 
> commands, the results came out correct. Unfortunately I also cannot share the 
> data to try and reproduce the problem, but I hope this report might be useful 
> nevertheless. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to