[spark] branch master updated: [SPARK-36626][PYTHON][FOLLOW-UP] Use datetime.tzinfo instead of datetime.tzname()

maxgekk Mon, 06 Sep 2021 08:17:53 -0700

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c6f3a13  [SPARK-36626][PYTHON][FOLLOW-UP] Use datetime.tzinfo instead 
of datetime.tzname()
c6f3a13 is described below

commit c6f3a13087a954d56ef671ecb82c8031a2f45d52
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Sep 6 17:16:52 2021 +0200

    [SPARK-36626][PYTHON][FOLLOW-UP] Use datetime.tzinfo instead of 
datetime.tzname()
    
    ### What changes were proposed in this pull request?
    
    This PR is a small followup of https://github.com/apache/spark/pull/33876 
which proposes to use `datetime.tzinfo` instead of `datetime.tzname` to see if 
timezome information is provided or not.
    
    This way is consistent with other places such as:
    
    
https://github.com/apache/spark/blob/9c5bcac61ee56fbb271e890cc33f9a983612c5b0/python/pyspark/sql/types.py#L182
    
    
https://github.com/apache/spark/blob/9c5bcac61ee56fbb271e890cc33f9a983612c5b0/python/pyspark/sql/types.py#L1662
    
    ### Why are the changes needed?
    
    In some cases, `datetime.tzname` can raise an exception 
(https://docs.python.org/3/library/datetime.html#datetime.datetime.tzname):
    
    > ... raises an exception if the latter doesn’t return None or a string 
object,
    
    I was able to reproduce this in Jenkins with setting 
`spark.sql.timestampType` to `TIMESTAMP_NTZ` by default:
    
    ```
    ======================================================================
    ERROR: test_time_with_timezone (pyspark.sql.tests.test_serde.SerdeTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests/test_serde.py",
 line 92, in test_time_with_timezone
    ...
      File "/usr/lib/pypy3/lib-python/3/datetime.py", line 979, in tzname
        raise NotImplementedError("tzinfo subclass must override tzname()")
    NotImplementedError: tzinfo subclass must override tzname()
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No to end users because it has not be released.
    This is rather a safeguard to prevent potential breakage.
    
    ### How was this patch tested?
    
    Manually tested.
    
    Closes #33918 from HyukjinKwon/SPARK-36626-followup.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
---
 python/pyspark/sql/types.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 6cb8aec..e8b7411 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1045,7 +1045,7 @@ def _infer_type(obj, infer_dict_as_struct=False, 
prefer_timestamp_ntz=False):
     if dataType is DecimalType:
         # the precision and scale of `obj` may be different from row to row.
         return DecimalType(38, 18)
-    if dataType is TimestampType and prefer_timestamp_ntz and obj.tzname() is 
None:
+    if dataType is TimestampType and prefer_timestamp_ntz and obj.tzinfo is 
None:
         return TimestampNTZType()
     elif dataType is not None:
         return dataType()

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-36626][PYTHON][FOLLOW-UP] Use datetime.tzinfo instead of datetime.tzname()

Reply via email to