Cheng Su created SPARK-34253:
--------------------------------

             Summary: Object hash aggregate should not fallback if no more 
input rows
                 Key: SPARK-34253
                 URL: https://issues.apache.org/jira/browse/SPARK-34253
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Cheng Su


Object hash aggregate will fallback to sort-based aggregation based on number 
of keys seen so far [0]. The default config threshold is 128 
(spark.sql.objectHashAggregate.sortBased.fallbackThreshold in [1]). There's an 
edge case we can do better, where we do not fallback if there's no more input 
rows. Suppose the task only has 128 group-by keys in hash ma, we don't need to 
fallback in this case, and we can save the extra sort.

 

[0]: 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala#L161]
 

[1]: 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L1615]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to