Imran Rashid created SPARK-27389:
------------------------------------

             Summary: Odd failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
                 Key: SPARK-27389
                 URL: https://issues.apache.org/jira/browse/SPARK-27389
             Project: Spark
          Issue Type: Task
          Components: jenkins
    Affects Versions: 3.0.0
            Reporter: Imran Rashid
            Assignee: shane knapp


I've seen a few odd PR build failures w/ an error in pyspark tests about 
"UnknownTimeZoneError: 'US/Pacific-New'".  eg. 
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull

A bit of searching tells me that US/Pacific-New probably isn't really supposed 
to be a timezone at all: 
https://mm.icann.org/pipermail/tz/2009-February/015448.html

I'm guessing that this is from some misconfiguration of jenkins.  that said, I 
can't figure out what is wrong.  There does seem to be a timezone entry for 
US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to be 
there on every amp-jenkins-worker, so I dunno what that alone would cause this 
failure sometime.

[~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be 
totally wrong here and it is really a pyspark problem.

Full Stack trace from the test failure:

{noformat}
======================================================================
ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py",
 line 522, in test_to_pandas
    pdf = self._to_pandas()
  File 
"/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py",
 line 517, in _to_pandas
    return df.toPandas()
  File 
"/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py",
 line 2189, in toPandas
    _check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
  File 
"/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
 line 1891, in _check_series_convert_timestamps_local_tz
    return _check_series_convert_timestamps_localize(s, None, timezone)
  File 
"/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
 line 1877, in _check_series_convert_timestamps_localize
    lambda ts: ts.tz_localize(from_tz, 
ambiguous=False).tz_convert(to_tz).tz_localize(None)
  File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 
2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer 
(pandas/lib.c:66124)
  File 
"/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
 line 1878, in <lambda>
    if ts is not pd.NaT else pd.NaT)
  File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert 
(pandas/tslib.c:13923)
  File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ 
(pandas/tslib.c:10447)
  File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject 
(pandas/tslib.c:27504)
  File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz 
(pandas/tslib.c:32362)
  File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line 178, 
in timezone
    raise UnknownTimeZoneError(zone)
UnknownTimeZoneError: 'US/Pacific-New'
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to