Vsevolod Stepanov created SPARK-45136:
-----------------------------------------
Summary: Improve ClosureCleaner to support closures defined in
Ammonite REPL
Key: SPARK-45136
URL: https://issues.apache.org/jira/browse/SPARK-45136
Project: Spark
Issue Type: Improvement
Components: Connect
Affects Versions: 4.0.0, 3.5.1
Reporter: Vsevolod Stepanov
ConnectRepl uses Ammonite REPL with CodeClassWrapper to run Scala code. It
means that each code cell is wrapped into a separate object. If there are
multiple variables defined in the same cell / code block it will lead to
capturing extra variables, increasing serialized UDF payload size or making it
non-serializable.
For example, this code
{code:java}
// cell 1
{
val x = 100
val y = new NonSerializable
}
// cell 2
spark.range(10).map(i => i + x).agg(sum("value")).collect(){code}
will fail because lambda will capture both `x` and `y` as they're defined in
the same wrapper object
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]