Utkarsh Agarwal created SPARK-49977:
---------------------------------------
Summary: Use stack-based iterative computation to avoid creating
many Scala List objects for deep expression trees
Key: SPARK-49977
URL: https://issues.apache.org/jira/browse/SPARK-49977
Project: Spark
Issue Type: Task
Components: Optimizer
Affects Versions: 4.0.0
Reporter: Utkarsh Agarwal
In some use cases with deep expression trees, the driver's heap shows many
`{{{}scala.collection.immutable.$colon$colon`{}}} objects from the heap. The
objects are allocated due to deep recursion in the {{gatherCommutative}} method
which uses {{flatmap}} recursively. Each invocation of {{flatmap}} creates a
new temporary Scala collection. Our suspicion is based on the following stack
trace (>1K lines) of a thread in the driver below, truncated here for brevity:
{code:java}
"HiveServer2-Background-Pool: Thread-9867" #9867 daemon prio=5 os_prio=0
tid=0x00007f35080bf000 nid=0x33e7 runnable [0x00007f3393372000]
java.lang.Thread.State: RUNNABLE at
scala.collection.immutable.List$Appender$1.apply(List.scala:350) at
scala.collection.immutable.List$Appender$1.apply(List.scala:341) at
scala.collection.immutable.List.flatMap(List.scala:431) at
org.apache.spark.sql.catalyst.expressions.CommutativeExpression.gatherCommutative(Expression.scala:1479)
at
org.apache.spark.sql.catalyst.expressions.CommutativeExpression.$anonfun$gatherCommutative$1(Expression.scala:1479)
at
org.apache.spark.sql.catalyst.expressions.CommutativeExpression$$Lambda$5280/143713747.apply(Unknown
Source) at scala.collection.immutable.List.flatMap(List.scala:366).... at
org.apache.spark.sql.catalyst.expressions.CommutativeExpression.gatherCommutative(Expression.scala:1479)
at
org.apache.spark.sql.catalyst.expressions.CommutativeExpression.$anonfun$gatherCommutative$1(Expression.scala:1479)
at
org.apache.spark.sql.catalyst.expressions.CommutativeExpression$$Lambda$5280/143713747.apply(Unknown
Source) at scala.collection.immutable.List.flatMap(List.scala:366).... {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]