[ 
https://issues.apache.org/jira/browse/SPARK-31426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31426.
---------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 28189
[https://github.com/apache/spark/pull/28189]

> Regression in loading/saving timestamps from/to ORC files
> ---------------------------------------------------------
>
>                 Key: SPARK-31426
>                 URL: https://issues.apache.org/jira/browse/SPARK-31426
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Here are results of DateTimeRebaseBenchmark on the current master branch:
> {code}
> Save timestamps to ORC:                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> after 1582                                        59877          59877        
>    0          1.7         598.8       0.0X
> before 1582                                       61361          61361        
>    0          1.6         613.6       0.0X
> Load timestamps from ORC:                 Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> after 1582, vec off                               48197          48288        
>  118          2.1         482.0       1.0X
> after 1582, vec on                                38247          38351        
>  128          2.6         382.5       1.3X
> before 1582, vec off                              53179          53359        
>  249          1.9         531.8       0.9X
> before 1582, vec on                               44076          44268        
>  269          2.3         440.8       1.1X
> {code}
> The results of the same benchmark on Spark 2.4.6-SNAPSHOT:
> {code}
> Save timestamps to ORC:                   Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> after 1582                                        18858          18858        
>    0          5.3         188.6       1.0X
> before 1582                                       18508          18508        
>    0          5.4         185.1       1.0X
> Load timestamps from ORC:                 Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> after 1582, vec off                               14063          14177        
>  143          7.1         140.6       1.0X
> after 1582, vec on                                 5955           6029        
>  100         16.8          59.5       2.4X
> before 1582, vec off                              14119          14126        
>    7          7.1         141.2       1.0X
> before 1582, vec on                                5991           6007        
>   25         16.7          59.9       2.3X
> {code}
>  Here is the PR with DateTimeRebaseBenchmark backported to 2.4: 
> https://github.com/MaxGekk/spark/pull/27



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to