[ https://issues.apache.org/jira/browse/SPARK-11135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-11135: ------------------------------- Summary: Exchange sort-planning logic incorrectly avoid sorts when existing ordering is non-empty subset of required ordering (was: Exchange sort-planning logic incorrectly avoid sorts when existing ordering is subset of required ordering) > Exchange sort-planning logic incorrectly avoid sorts when existing ordering > is non-empty subset of required ordering > -------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-11135 > URL: https://issues.apache.org/jira/browse/SPARK-11135 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Blocker > > In Spark SQL, the Exchange planner tries to avoid unnecessary sorts in cases > where the data has already been sorted by a superset of the requested sorting > columns. For instance, let's say that a query calls for an operator's input > to be sorted by `a.asc` and the input happens to already be sorted by > `[a.asc, b.asc]`. In this case, we do not need to re-sort the input. The > converse, however, is not true: if the query calls for `[a.asc, b.asc]`, then > `a.asc` alone will not satisfy the ordering requirements, requiring an > additional sort to be planned by Exchange. > However, the current Exchange code gets this wrong and incorrectly skips > sorting when the existing output ordering is a subset of the required > ordering. This is simple to fix, however. > This bug was introduced in https://github.com/apache/spark/pull/7458, so it > affects 1.5.0+. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org