Vsevolod Stepanov created SPARK-45136:
-----------------------------------------

             Summary: Improve ClosureCleaner to support closures defined in 
Ammonite REPL
                 Key: SPARK-45136
                 URL: https://issues.apache.org/jira/browse/SPARK-45136
             Project: Spark
          Issue Type: Improvement
          Components: Connect
    Affects Versions: 4.0.0, 3.5.1
            Reporter: Vsevolod Stepanov


ConnectRepl uses Ammonite REPL with  CodeClassWrapper to run Scala code. It 
means that each code cell is wrapped into a separate object. If there are 
multiple variables defined in the same cell / code block it will lead to 
capturing extra variables, increasing serialized UDF payload size or making it 
non-serializable.

For example, this code
{code:java}
// cell 1 
{
  val x = 100
  val y = new NonSerializable
}

// cell 2
spark.range(10).map(i => i + x).agg(sum("value")).collect(){code}
will fail because lambda will capture both `x` and `y` as they're defined in 
the same wrapper object



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to