[
https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732932#comment-17732932
]
Haejoon Lee commented on SPARK-43291:
-------------------------------------
With the major release of pandas 2.0.0 on April 3, 2023, numerous breaking
changes have been introduced. So, we have made the decision to postpone
addressing these breaking changes until the next major release of Spark,
version 4.0.0 to minimize disruptions for our users and provide a more seamless
upgrade experience.
The pandas 2.0.0 release includes a significant number of updates, such as API
removals, changes in API behavior, parameter removals, parameter behavior
changes, and bug fixes. We have planned the following approach for each item:
- {*}API Removals{*}: Removed APIs will remain deprecated in Spark 3.5.0,
provide appropriate warnings, and will be removed in Spark 4.0.0.
- {*}API Behavior Changes{*}: APIs with changed behavior will retain the
behavior in Spark 3.5.0, provide appropriate warnings, and will align the
behavior with pandas in Spark 4.0.0.
- {*}Parameter Removals{*}: Removed parameters will remain deprecated in Spark
3.5.0, provide appropriate warnings, and will be removed in Spark 4.0.0.
- {*}Parameter Behavior Changes{*}: Parameters with changed behavior will
retain the behavior in Spark 3.5.0, provide appropriate warnings, and will
align the behavior with pandas in Spark 4.0.0.
- {*}Bug Fixes{*}: Bug fixes mainly related to correctness issues will be fixed
in pandas 3.5.0.
*To recap, all breaking changes related to pandas 2.0.0 will be supported in
Spark 4.0.0,* *and will remain deprecated with appropriate errors in Spark
3.5.0.*
Will submit a PR that deprecates all APIs and adds warnings very soon.
> Match behavior for DataFrame.cov on string DataFrame
> ----------------------------------------------------
>
> Key: SPARK-43291
> URL: https://issues.apache.org/jira/browse/SPARK-43291
> Project: Spark
> Issue Type: Sub-task
> Components: Pandas API on Spark
> Affects Versions: 3.5.0
> Reporter: Haejoon Lee
> Priority: Major
>
> Should enable test below:
> {code:java}
> pdf = pd.DataFrame([("1", "2"), ("0", "3"), ("2", "0"), ("1", "1")],
> columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> self.assert_eq(pdf.cov(), psdf.cov()) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]