This is an automated email from the ASF dual-hosted git repository.
maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new c6f3a13 [SPARK-36626][PYTHON][FOLLOW-UP] Use datetime.tzinfo instead
of datetime.tzname()
c6f3a13 is described below
commit c6f3a13087a954d56ef671ecb82c8031a2f45d52
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Sep 6 17:16:52 2021 +0200
[SPARK-36626][PYTHON][FOLLOW-UP] Use datetime.tzinfo instead of
datetime.tzname()
### What changes were proposed in this pull request?
This PR is a small followup of https://github.com/apache/spark/pull/33876
which proposes to use `datetime.tzinfo` instead of `datetime.tzname` to see if
timezome information is provided or not.
This way is consistent with other places such as:
https://github.com/apache/spark/blob/9c5bcac61ee56fbb271e890cc33f9a983612c5b0/python/pyspark/sql/types.py#L182
https://github.com/apache/spark/blob/9c5bcac61ee56fbb271e890cc33f9a983612c5b0/python/pyspark/sql/types.py#L1662
### Why are the changes needed?
In some cases, `datetime.tzname` can raise an exception
(https://docs.python.org/3/library/datetime.html#datetime.datetime.tzname):
> ... raises an exception if the latter doesn’t return None or a string
object,
I was able to reproduce this in Jenkins with setting
`spark.sql.timestampType` to `TIMESTAMP_NTZ` by default:
```
======================================================================
ERROR: test_time_with_timezone (pyspark.sql.tests.test_serde.SerdeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests/test_serde.py",
line 92, in test_time_with_timezone
...
File "/usr/lib/pypy3/lib-python/3/datetime.py", line 979, in tzname
raise NotImplementedError("tzinfo subclass must override tzname()")
NotImplementedError: tzinfo subclass must override tzname()
```
### Does this PR introduce _any_ user-facing change?
No to end users because it has not be released.
This is rather a safeguard to prevent potential breakage.
### How was this patch tested?
Manually tested.
Closes #33918 from HyukjinKwon/SPARK-36626-followup.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
---
python/pyspark/sql/types.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 6cb8aec..e8b7411 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1045,7 +1045,7 @@ def _infer_type(obj, infer_dict_as_struct=False,
prefer_timestamp_ntz=False):
if dataType is DecimalType:
# the precision and scale of `obj` may be different from row to row.
return DecimalType(38, 18)
- if dataType is TimestampType and prefer_timestamp_ntz and obj.tzname() is
None:
+ if dataType is TimestampType and prefer_timestamp_ntz and obj.tzinfo is
None:
return TimestampNTZType()
elif dataType is not None:
return dataType()
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]