[ 
https://issues.apache.org/jira/browse/SPARK-44101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739502#comment-17739502
 ] 

Haejoon Lee commented on SPARK-44101:
-------------------------------------

I would like to reiterate the key decisions regarding the pandas 2.0 upgrade 
here:

With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking 
changes have been introduced. So, we have made the decision to postpone 
addressing these breaking changes until the next major release of Spark, 
version 4.0.0 to minimize disruptions for our users and provide a more seamless 
upgrade experience.

The pandas 2.0.0 release includes a significant number of updates, such as API 
removals, changes in API behavior, parameter removals, parameter behavior 
changes, and bug fixes. We have planned the following approach for each item:

- {*}API Removals{*}: Removed APIs will remain deprecated in Spark 3.5.0, 
provide appropriate warnings, and will be removed in Spark 4.0.0.

- {*}API Behavior Changes{*}: APIs with changed behavior will retain the 
behavior in Spark 3.5.0, provide appropriate warnings, and will align the 
behavior with pandas in Spark 4.0.0.

- {*}Parameter Removals{*}: Removed parameters will remain deprecated in Spark 
3.5.0, provide appropriate warnings, and will be removed in Spark 4.0.0.

- {*}Parameter Behavior Changes{*}: Parameters with changed behavior will 
retain the behavior in Spark 3.5.0, provide appropriate warnings, and will 
align the behavior with pandas in Spark 4.0.0.

- {*}Bug Fixes{*}: Bug fixes mainly related to correctness issues will be fixed 
in pandas 3.5.0.

*To recap, all breaking changes related to pandas 2.0.0 will be supported in 
Spark 4.0.0,* *and will remain deprecated with appropriate errors in Spark 
3.5.0.*
 
Will submit a PR that deprecates all APIs and adds warnings very soon.

Also cc [~panbingkun] [~bjornjorgensen] FYI

> Support pandas 2
> ----------------
>
>                 Key: SPARK-44101
>                 URL: https://issues.apache.org/jira/browse/SPARK-44101
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Pandas API on Spark, PySpark
>    Affects Versions: 3.5.0
>            Reporter: Haejoon Lee
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to