Maciej Bryński created SPARK-22010:
--------------------------------------
Summary: Slow fromInternal conversion for TimestampType
Key: SPARK-22010
URL: https://issues.apache.org/jira/browse/SPARK-22010
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.2.0
Reporter: Maciej Bryński
To convert timestamp type to python we are using
`datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts %
1000000)`
code.
{code}
In [34]: %%timeit
...: datetime.datetime.fromtimestamp(1505383647).replace(microsecond=12344)
...:
4.2 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
{code}
It's slow, because:
# we're trying to get TZ on every conversion
# we're using replace method
Proposed solution: custom datetime conversion and move calculation of TZ to
module
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]