XiDuo You created SPARK-40594:
---------------------------------
Summary: Eagerly release hashed relation in ShuffledHashJoin
Key: SPARK-40594
URL: https://issues.apache.org/jira/browse/SPARK-40594
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You
ShuffledHashJoin releases the built hashed relation at the end of task using
taskCompletionListener. It is not always good enough for complex sql query.
If a smj on the top of the shj, then the hashed relation in shj would be leak.
All rows have been consumed in sort before smj and then in smj the buffered
rows can not allocate the memory which is hold by hashed relation. Then it
causes unnecessary spill.
It is a common case in multi-join, since AQE supports convert smj to shj at
runtime.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]