cp created SPARK-34937:
--------------------------
Summary: Remove unnecessary sort in FileFormatWriter when only
have one partition
Key: SPARK-34937
URL: https://issues.apache.org/jira/browse/SPARK-34937
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.1.1
Reporter: cp
sort is a very resource-intensive operation, spark may add SortExec when
writing partition file for writing sequentially, but i think sorting is
unnecessary when only have one partition.
related file:
{code:java}
org.apache.spark.sql.execution.datasources.FileFormatWriter
{code}
related code:
{code:java}
// We should first sort by partition columns, then bucket id, and finally
sorting columns.
val requiredOrdering = partitionColumns ++ bucketIdExpression ++ sortColumns
// the sort order doesn't matter
val actualOrdering = plan.outputOrdering.map(_.child)
val orderingMatched = if (requiredOrdering.length > actualOrdering.length) {
false
} else {
requiredOrdering.zip(actualOrdering).forall {
case (requiredOrder, childOutputOrder) =>
requiredOrder.semanticEquals(childOutputOrder)
}
}{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]