Raphael Luta created SPARK-35494:
------------------------------------

             Summary: Timestamp casting performance issue when invoked with 
timezone
                 Key: SPARK-35494
                 URL: https://issues.apache.org/jira/browse/SPARK-35494
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.8, 2.4.7
            Reporter: Raphael Luta
             Fix For: 3.0.0


In Spark SQL, when converting a datetime string column to timestamp with cast 
or to_timestamp, we have noticed a major performance issue when the source 
string contains timezone information (for example 2021-05-24T00:00:00+02:00)

This simple benchmark illustrates the difference

 
{noformat}
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Timestamp Conversion: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------
no tz                 1368 / 1379       3,1       326,1       1,0X
with UTC tz           5940 / 5947       0,7       1416,2      0,2X
with hours tz         5940 / 5962       0,7       1416,2      0,2X
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to