Cheng Su created SPARK-34237:
--------------------------------

             Summary: Add more metrics (fallback, spill) for object hash 
aggregate
                 Key: SPARK-34237
                 URL: https://issues.apache.org/jira/browse/SPARK-34237
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Cheng Su


As object hash aggregate fallback mechanism is special - it will fallback to 
sort-based aggregation based on number of keys seen so far [0]. This fallback 
logic sometimes is sub-optimal and leads to unnecessary sort, and performance 
degradation in run-time. The first step to help user/developer debug is to add 
more related metrics on UI, e.g. spill size, and number of fallback to 
sort-based aggregation. (spill size metrics was already added for hash 
aggregate [1])

 

[0]: 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala#L161]
 

[1]: 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L68]
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to