[
https://issues.apache.org/jira/browse/SPARK-43282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haejoon Lee resolved SPARK-43282.
---------------------------------
Resolution: Won't Fix
> Investigate DataFrame.sort_values with pandas behavior.
> -------------------------------------------------------
>
> Key: SPARK-43282
> URL: https://issues.apache.org/jira/browse/SPARK-43282
> Project: Spark
> Issue Type: Sub-task
> Components: Pandas API on Spark
> Affects Versions: 4.0.0
> Reporter: Haejoon Lee
> Priority: Major
>
> {code:java}
> import pandas as pd
> pdf = pd.DataFrame(
> {
> "a": pd.Categorical([1, 2, 3, 1, 2, 3]),
> "b": pd.Categorical(
> ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"]
> ),
> },
> )
> pdf.groupby("a").apply(lambda x: x).sort_values(["a"])
> Traceback (most recent call last):
> ...
> ValueError: 'a' is both an index level and a column label, which is
> ambiguous. {code}
> We should investigate this issue whether this is intended behavior or just
> bug in pandas.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]