[ 
https://issues.apache.org/jira/browse/SPARK-31443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-31443:
-------------------------------
    Description: 
DateTimeBenchmark shows the regression

Spark 2.4.6-SNAPSHOT at the PR [https://github.com/MaxGekk/spark/pull/27]
{code:java}
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
>From java.sql.Date                                  559            603         
> 38          8.9         111.8       1.0X
Collect dates                                      2306           3221        
1558          2.2         461.1       0.2X
{code}
Current master:
{code:java}
OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
>From java.sql.Date                                 1052           1130         
> 73          4.8         210.3       1.0X
Collect dates                                      3251           4943        
1624          1.5         650.2       0.3X
{code}
If we subtract preparing DATE column:
* Spark 2.4.6-SNAPSHOT is (461.1 - 111.8) = 349.3 ns/row
* master is (650.2 - 210.3) = 439 ns/row

The regression of toJavaDate in master against Spark 2.4.6-SNAPSHOT is (439 - 
349.3)/349.3 = 25%

  was:
DateTimeBenchmark shows the regression

Spark 2.4.6-SNAPSHOT at the PR https://github.com/MaxGekk/spark/pull/27
{code}
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
To/from java.sql.Timestamp:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
>From java.sql.Date                                  614            655         
> 43          8.1         122.8       1.0X
{code}

Current master:
{code}
================================================================================================
Conversion from/to external types
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
4.15.0-1063-aws
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
To/from java.sql.Timestamp:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
>From java.sql.Date                                 1154           1206         
> 46          4.3         230.9       1.0X
{code}

The regression is ~x2.


> Perf regression of toJavaDate
> -----------------------------
>
>                 Key: SPARK-31443
>                 URL: https://issues.apache.org/jira/browse/SPARK-31443
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Maxim Gekk
>            Priority: Major
>
> DateTimeBenchmark shows the regression
> Spark 2.4.6-SNAPSHOT at the PR [https://github.com/MaxGekk/spark/pull/27]
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
> 4.15.0-1063-aws
> Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
> To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> From java.sql.Date                                  559            603        
>   38          8.9         111.8       1.0X
> Collect dates                                      2306           3221        
> 1558          2.2         461.1       0.2X
> {code}
> Current master:
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 
> 4.15.0-1063-aws
> Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
> To/from Java's date-time:                 Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> From java.sql.Date                                 1052           1130        
>   73          4.8         210.3       1.0X
> Collect dates                                      3251           4943        
> 1624          1.5         650.2       0.3X
> {code}
> If we subtract preparing DATE column:
> * Spark 2.4.6-SNAPSHOT is (461.1 - 111.8) = 349.3 ns/row
> * master is (650.2 - 210.3) = 439 ns/row
> The regression of toJavaDate in master against Spark 2.4.6-SNAPSHOT is (439 - 
> 349.3)/349.3 = 25%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to