[ 
https://issues.apache.org/jira/browse/SPARK-44946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Roels updated SPARK-44946:
-----------------------------------
    Description: 
When converting a Spark DataFrame into a pandas DataFrame, we get a 
FutureWarning when the DataFrame contains columns of type {{timestamp. }}

Reproducible example (that you can run locally):
{code:java}
from datetime import datetime

from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession.builder.getOrCreate()

df = pd.DataFrame({"foo": [datetime(2023, 1, 1), datetime(2023, 1, 1)]})

df_sp = spark.createDataFrame(df)

test = df_sp.toPandas()

// warning logs: 
/usr/local/lib/python3.10/site-packages/pyspark/sql/pandas/conversion.py:251: 
FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated and 
will raise in a future version. Pass 'datetime64[ns]' instead

{code}
Note that if we enable arrow (by setting 
{{{}config("spark.sql.execution.arrow.pyspark.enabled", "true"){}}}), this 
warning is gone! Although I admit that I have seen it popping up once with 
arrow enabled too, but I could not create a reproducible example out of that. 

For my test, I ran it in a docker container: 
 * Python version: python 3.10 (base image python:3.10-slim-bullseye)
 * Java: openjdk-17-jre-headless
 * Spark: 3.4.1
 * pandas: 1.5.3

 

Note that this basically means that I cannot use Spark with pandas 2.0 without 
Arrow enabled... 

  was:
When converting a Spark DataFrame into a pandas DataFrame, we get a 
FutureWarning when the DataFrame contains columns of type {{timestamp. }}

Reproducible example (that you can run locally):
{code:java}
from datetime import datetime

from pyspark.sql import SparkSession
import pandas as pd

spark = SparkSession.builder.getOrCreate()

df = pd.DataFrame({"foo": [datetime(2023, 1, 1), datetime(2023, 1, 1)]})

df_sp = spark.createDataFrame(df)

test = df_sp.toPandas()

// warning logs: 
/usr/local/lib/python3.10/site-packages/pyspark/sql/pandas/conversion.py:251: 
FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated and 
will raise in a future version. Pass 'datetime64[ns]' instead

{code}
Note that if we enable arrow (by setting 
{{{}config("spark.sql.execution.arrow.pyspark.enabled", "true"){}}}), this 
warning is gone! Although I admit that I have seen it popping up once with 
arrow enabled too, but I could not create a reproducible example out of that. 

This means that I cannot use Spark with pandas 2.0 without Arrow enabled... 

For my test, I ran it in a docker container: 
 * Python version: python 3.10 (base image python:3.10-slim-bullseye)
 * Java: openjdk-17-jre-headless
 * Spark: 3.4.1
 * pandas: 1.5.3


> toPandas() gives FutureWarning when containing columns of datatype timestamp
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-44946
>                 URL: https://issues.apache.org/jira/browse/SPARK-44946
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.4.1
>            Reporter: Matthias Roels
>            Priority: Major
>
> When converting a Spark DataFrame into a pandas DataFrame, we get a 
> FutureWarning when the DataFrame contains columns of type {{timestamp. }}
> Reproducible example (that you can run locally):
> {code:java}
> from datetime import datetime
> from pyspark.sql import SparkSession
> import pandas as pd
> spark = SparkSession.builder.getOrCreate()
> df = pd.DataFrame({"foo": [datetime(2023, 1, 1), datetime(2023, 1, 1)]})
> df_sp = spark.createDataFrame(df)
> test = df_sp.toPandas()
> // warning logs: 
> /usr/local/lib/python3.10/site-packages/pyspark/sql/pandas/conversion.py:251: 
> FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated 
> and will raise in a future version. Pass 'datetime64[ns]' instead
> {code}
> Note that if we enable arrow (by setting 
> {{{}config("spark.sql.execution.arrow.pyspark.enabled", "true"){}}}), this 
> warning is gone! Although I admit that I have seen it popping up once with 
> arrow enabled too, but I could not create a reproducible example out of that. 
> For my test, I ran it in a docker container: 
>  * Python version: python 3.10 (base image python:3.10-slim-bullseye)
>  * Java: openjdk-17-jre-headless
>  * Spark: 3.4.1
>  * pandas: 1.5.3
>  
> Note that this basically means that I cannot use Spark with pandas 2.0 
> without Arrow enabled... 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to