[jira] [Updated] (SPARK-39821) DatetimeIndex error during pyspark session

ASF GitHub Bot (Jira) Thu, 26 Sep 2024 20:16:35 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SPARK-39821:
-----------------------------------
    Labels: pull-request-available  (was: )

> DatetimeIndex error during pyspark session
> ------------------------------------------
>
>                 Key: SPARK-39821
>                 URL: https://issues.apache.org/jira/browse/SPARK-39821
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.2
>         Environment: OS: ubuntu
> Python version: 3.8.13
>            Reporter: bo zhao
>            Priority: Minor
>              Labels: pull-request-available
>
> {code:java}
> Using Python version 3.8.13 (default, Jun 29 2022 11:50:19)
> Spark context Web UI available at http://172.25.179.45:4042
> Spark context available as 'sc' (master = local[*], app id = 
> local-1658283215853).
> SparkSession available as 'spark'.
> >>> from pyspark import pandas as ps
> WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
> is required to set this environment variable to '1' in both driver and 
> executor sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you 
> but it does not work if there is a Spark context already launched.
> >>> ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01'])
> /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: 
> iteritems is deprecated and will be removed in a future version. Use .items 
> instead.
>   fields = [
> /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: FutureWarning: 
> iteritems is deprecated and will be removed in a future version. Use .items 
> instead.
>   for column, series in pdf.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
>  FutureWarning: iteritems is deprecated and will be removed in a future 
> version. Use .items instead.
>   for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
>  FutureWarning: iteritems is deprecated and will be removed in a future 
> version. Use .items instead.
>   for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
>  FutureWarning: iteritems is deprecated and will be removed in a future 
> version. Use .items instead.
>   for item in s.iteritems():
> /home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/_pydevd_bundle/pydevd_utils.py:601:
>  FutureWarning: iteritems is deprecated and will be removed in a future 
> version. Use .items instead.
>   for item in s.iteritems():
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/spark/spark/python/pyspark/pandas/indexes/base.py", line 2770, 
> in __repr__
>     pindex = 
> self._psdf._get_or_create_repr_pandas_cache(max_display_count).index
>   File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12780, in 
> _get_or_create_repr_pandas_cache
>     self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
>   File "/home/spark/spark/python/pyspark/pandas/frame.py", line 12775, in 
> _to_internal_pandas
>     return self._internal.to_pandas_frame
>   File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in 
> wrapped_lazy_property
>     setattr(self, attr_name, fn(self))
>   File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1056, in 
> to_pandas_frame
>     pdf = sdf.toPandas()
>   File "/home/spark/spark/python/pyspark/sql/pandas/conversion.py", line 248, 
> in toPandas
>     series = series.astype(t, copy=False)
>   File "/home/spark/upstream/pandas/pandas/core/generic.py", line 6095, in 
> astype
>     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
>   File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 
> 386, in astype
>     return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
>   File "/home/spark/upstream/pandas/pandas/core/internals/managers.py", line 
> 308, in apply
>     applied = getattr(b, f)(**kwargs)
>   File "/home/spark/upstream/pandas/pandas/core/internals/blocks.py", line 
> 526, in astype
>     new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
>   File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 299, 
> in astype_array_safe
>     new_values = astype_array(values, dtype, copy=copy)
>   File "/home/spark/upstream/pandas/pandas/core/dtypes/astype.py", line 227, 
> in astype_array
>     values = values.astype(dtype, copy=copy)
>   File "/home/spark/upstream/pandas/pandas/core/arrays/datetimes.py", line 
> 631, in astype
>     return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
>   File "/home/spark/upstream/pandas/pandas/core/arrays/datetimelike.py", line 
> 504, in astype
>     raise TypeError(msg)
> TypeError: Cannot cast DatetimeArray to dtype datetime64
>  {code}
> I exec pyspark, and insert the ps.DatetimeIndex(['1970-01-01', '1970-01-01', 
> '1970-01-01']) in the session.
> But it don't raise error like below
> {code:java}
> a = ps.DatetimeIndex(['1970-01-01', '1970-01-01', '1970-01-01']) {code}
> It will raise error when I call a in the session, such as
> {code:java}
> >>> a
> {code}
> So, it would be in trouch in the __repr__ function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-39821) DatetimeIndex error during pyspark session

Reply via email to