[
https://issues.apache.org/jira/browse/SPARK-39821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-39821:
-----------------------------------
Labels: pull-request-available (was: )
> DatetimeIndex error during pyspark session
> ------------------------------------------
>
> Key: SPARK-39821
> URL: https://issues.apache.org/jira/browse/SPARK-39821
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.2
> Environment: OS: ubuntu
> Python version: 3.8.13
> Reporter: bo zhao
> Priority: Minor
> Labels: pull-request-available
>
> {code:java}
> Using Python version 3.8.13 (default, Jun 29 2022 11:50:19)
> Spark context Web UI available at http://172.25.179.45:4042
> Spark context available as 'sc' (master = local[*], app id =
> local-1658283215853).
> SparkSession available as 'spark'.
> >>> from pyspark import pandas as ps
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It
> is required to set this environment variable to '1' in both driver and
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you
> but it does not work if there is a Spark context already launched.
> >>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01'])
> /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning:
> iteritems is deprecated and will be removed in a future version. Use .items
> instead.
> fields = [
> /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning:
> iteritems is deprecated and will be removed in a future version. Use .items
> instead.
> for column, series in pdf.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
> FutureWarning: iteritems is deprecated and will be removed in a future
> version. Use .items instead.
> for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
> FutureWarning: iteritems is deprecated and will be removed in a future
> version. Use .items instead.
> for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
> FutureWarning: iteritems is deprecated and will be removed in a future
> version. Use .items instead.
> for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
> FutureWarning: iteritems is deprecated and will be removed in a future
> version. Use .items instead.
> for item in s.iteritems():
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770,
> in __repr__
> pindex =
> self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
> File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in
> _get_or_create_repr_pandas_cache
> self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
> File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in
> _to_internal_pandas
> return self._internal.to_pandas_frame
> File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in
> wrapped_lazy_property
> setattr(self, attr_name, fn(self))
> File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in
> to_pandas_frame
> pdf = sdf.toPandas()
> File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248,
> in toPandas
> series = series.astype(t, copy=False)
> File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in
> astype
> new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
> File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
> 386, in astype
> return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
> File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line
> 308, in apply
> applied = getattr(b, f)(**kwargs)
> File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line
> 526, in astype
> new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
> File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299,
> in astype_array_safe
> new_values = astype_array(values, dtype, copy=copy)
> File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227,
> in astype_array
> values = values.astype(dtype, copy=copy)
> File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line
> 631, in astype
> return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
> File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line
> 504, in astype
> raise TypeError(msg)
> TypeError: Cannot cast DatetimeArray to dtype datetime64
> {code}
> I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01',
> '1970-01-01']) in the session.
> But it don't raise error like below
> {code:java}
> a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code}
> It will raise error when I call a in the session, such as
> {code:java}
> >>> a
> {code}
> So, it would be in trouch in the __repr__ function.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]