[
https://issues.apache.org/jira/browse/SPARK-46349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Majid Hajiheidari updated SPARK-46349:
--------------------------------------
Description:
Hello everyone,
This is my first contribution to the project. I welcome any feedback and edits
to improve this pull request.Currently, it's possible to create redundant sort
expressions in both Scala and Python APIs, leading to potentially incorrect and
confusing SQL statements. For example:
Scala:
{code:java}
spark.range(10).orderBy($"id".desc.asc).show(){code}
Python:
{code:java}
spark.range(10).orderBy(f.desc('id'), ascending=False).show(){code}
Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST,
causing non-descriptive error messages.
I look forward to your feedback and thank you for considering this contribution.
was:
Hello everyone,
This is my first contribution to the project. I welcome any feedback and edits
to improve this pull request.Currently, it's possible to create redundant sort
expressions in both Scala and Python APIs, leading to potentially incorrect and
confusing SQL statements. For example:
Scala:
{code:java}
spark.range(10).orderBy($"id".desc.asc).show(){code}
Python:
{code:java}
spark.range(10).orderBy(f.desc('id'), ascending=False).show(){code}
Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST,
causing non-descriptive error messages.
I created a pull request for handling the issue. This pull request introduces a
constraint in the SortOrder class, ensuring that its child cannot be another
instance of SortOrder. This change prevents the creation of nested, redundant
sort expressions.
Additionally, in PySpark's DataFrame.sort, there's an ascending keyword
argument that could conflict with already sorted expressions. I've added an
exception handler to generate more descriptive error messages in such cases.
A test case has been added to verify that no double ordering occurs after this
fix.
I look forward to your feedback and thank you for considering this contribution.
> Prevent Multiple SortOrders for an Expression
> ---------------------------------------------
>
> Key: SPARK-46349
> URL: https://issues.apache.org/jira/browse/SPARK-46349
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 4.0.0
> Reporter: Majid Hajiheidari
> Priority: Minor
>
> Hello everyone,
> This is my first contribution to the project. I welcome any feedback and
> edits to improve this pull request.Currently, it's possible to create
> redundant sort expressions in both Scala and Python APIs, leading to
> potentially incorrect and confusing SQL statements. For example:
> Scala:
> {code:java}
> spark.range(10).orderBy($"id".desc.asc).show(){code}
> Python:
> {code:java}
> spark.range(10).orderBy(f.desc('id'), ascending=False).show(){code}
>
> Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST,
> causing non-descriptive error messages.
>
> I look forward to your feedback and thank you for considering this
> contribution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]