Raphael Luta created SPARK-35494:
------------------------------------
Summary: Timestamp casting performance issue when invoked with
timezone
Key: SPARK-35494
URL: https://issues.apache.org/jira/browse/SPARK-35494
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.8, 2.4.7
Reporter: Raphael Luta
Fix For: 3.0.0
In Spark SQL, when converting a datetime string column to timestamp with cast
or to_timestamp, we have noticed a major performance issue when the source
string contains timezone information (for example 2021-05-24T00:00:00+02:00)
This simple benchmark illustrates the difference
{noformat}
OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Timestamp Conversion: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------
no tz 1368 / 1379 3,1 326,1 1,0X
with UTC tz 5940 / 5947 0,7 1416,2 0,2X
with hours tz 5940 / 5962 0,7 1416,2 0,2X
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]