[ https://issues.apache.org/jira/browse/SPARK-44946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Roels updated SPARK-44946: ----------------------------------- Description: When converting a Spark DataFrame into a pandas DataFrame, we get a FutureWarning when the DataFrame contains columns of type {{timestamp. }} Reproducible example (that you can run locally): {code:java} from datetime import datetime from pyspark.sql import SparkSession import pandas as pd spark = SparkSession.builder.getOrCreate() df = pd.DataFrame({"foo": [datetime(2023, 1, 1), datetime(2023, 1, 1)]}) df_sp = spark.createDataFrame(df) test = df_sp.toPandas() // warning logs: /usr/local/lib/python3.10/site-packages/pyspark/sql/pandas/conversion.py:251: FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated and will raise in a future version. Pass 'datetime64[ns]' instead {code} Note that if we enable arrow (by setting {{{}config("spark.sql.execution.arrow.pyspark.enabled", "true"){}}}), this warning is gone! Although I admit that I have seen it popping up once with arrow enabled too, but I could not create a reproducible example out of that. For my test, I ran it in a docker container: * Python version: python 3.10 (base image python:3.10-slim-bullseye) * Java: openjdk-17-jre-headless * Spark: 3.4.1 * pandas: 1.5.3 Note that this basically means that I cannot use Spark with pandas 2.0 without Arrow enabled... was: When converting a Spark DataFrame into a pandas DataFrame, we get a FutureWarning when the DataFrame contains columns of type {{timestamp. }} Reproducible example (that you can run locally): {code:java} from datetime import datetime from pyspark.sql import SparkSession import pandas as pd spark = SparkSession.builder.getOrCreate() df = pd.DataFrame({"foo": [datetime(2023, 1, 1), datetime(2023, 1, 1)]}) df_sp = spark.createDataFrame(df) test = df_sp.toPandas() // warning logs: /usr/local/lib/python3.10/site-packages/pyspark/sql/pandas/conversion.py:251: FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated and will raise in a future version. Pass 'datetime64[ns]' instead {code} Note that if we enable arrow (by setting {{{}config("spark.sql.execution.arrow.pyspark.enabled", "true"){}}}), this warning is gone! Although I admit that I have seen it popping up once with arrow enabled too, but I could not create a reproducible example out of that. This means that I cannot use Spark with pandas 2.0 without Arrow enabled... For my test, I ran it in a docker container: * Python version: python 3.10 (base image python:3.10-slim-bullseye) * Java: openjdk-17-jre-headless * Spark: 3.4.1 * pandas: 1.5.3 > toPandas() gives FutureWarning when containing columns of datatype timestamp > ---------------------------------------------------------------------------- > > Key: SPARK-44946 > URL: https://issues.apache.org/jira/browse/SPARK-44946 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.4.1 > Reporter: Matthias Roels > Priority: Major > > When converting a Spark DataFrame into a pandas DataFrame, we get a > FutureWarning when the DataFrame contains columns of type {{timestamp. }} > Reproducible example (that you can run locally): > {code:java} > from datetime import datetime > from pyspark.sql import SparkSession > import pandas as pd > spark = SparkSession.builder.getOrCreate() > df = pd.DataFrame({"foo": [datetime(2023, 1, 1), datetime(2023, 1, 1)]}) > df_sp = spark.createDataFrame(df) > test = df_sp.toPandas() > // warning logs: > /usr/local/lib/python3.10/site-packages/pyspark/sql/pandas/conversion.py:251: > FutureWarning: Passing unit-less datetime64 dtype to .astype is deprecated > and will raise in a future version. Pass 'datetime64[ns]' instead > {code} > Note that if we enable arrow (by setting > {{{}config("spark.sql.execution.arrow.pyspark.enabled", "true"){}}}), this > warning is gone! Although I admit that I have seen it popping up once with > arrow enabled too, but I could not create a reproducible example out of that. > For my test, I ran it in a docker container: > * Python version: python 3.10 (base image python:3.10-slim-bullseye) > * Java: openjdk-17-jre-headless > * Spark: 3.4.1 > * pandas: 1.5.3 > > Note that this basically means that I cannot use Spark with pandas 2.0 > without Arrow enabled... -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org