cp created SPARK-34937:
--------------------------

             Summary: Remove unnecessary sort in FileFormatWriter when only 
have one partition
                 Key: SPARK-34937
                 URL: https://issues.apache.org/jira/browse/SPARK-34937
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.1
            Reporter: cp


sort is a very resource-intensive operation, spark may add SortExec when 
writing partition file for writing sequentially, but i think sorting is 
unnecessary when only have one partition. 

 

related file: 
{code:java}
org.apache.spark.sql.execution.datasources.FileFormatWriter
{code}
related code:
{code:java}
// We should first sort by partition columns, then bucket id, and finally 
sorting columns.
val requiredOrdering = partitionColumns ++ bucketIdExpression ++ sortColumns
// the sort order doesn't matter
val actualOrdering = plan.outputOrdering.map(_.child)
val orderingMatched = if (requiredOrdering.length > actualOrdering.length) {
  false
} else {
  requiredOrdering.zip(actualOrdering).forall {
    case (requiredOrder, childOutputOrder) =>
      requiredOrder.semanticEquals(childOutputOrder)
  }
}{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to