Alexandr Shames created TOREE-560:
-------------------------------------

             Summary: Serialization issue when using a function defined in 
another cell with RDD transformations in Toree
                 Key: TOREE-560
                 URL: https://issues.apache.org/jira/browse/TOREE-560
             Project: TOREE
          Issue Type: Bug
          Components: Kernel
         Environment: Spark 3.4.3
Scala 2.12.18

            Reporter: Alexandr Shames


*Description:*
When using a function defined in one cell and applying it to an RDD 
transformation in another cell, Toree fails to serialize the function 
correctly. This happens even with simple functions and RDDs.

 

*Disclaimer:*
_This is my first bug report, so I apologize if I missed any details or made 
mistakes. I'm still learning, and any feedback is appreciated! :)_

 

*Steps to Reproduce:*
 # Define a function in one cell:
{{val func = (x: Int) => x}}
 # In another cell, apply this function to an RDD:
{{sc.parallelize(List(1,2,3)).map\{ (s) => (func(s)) }}}
// (Here, {{sc}} is the SparkContext.)

 

*Expected Behavior:*
The RDD transformation should work correctly, and the function should be 
serialized and executed on the executors.

*Actual Behavior:*
An error occurs, indicating a serialization issue. The function is not properly 
serialized when defined in a separate cell.

*Workaround:*
Defining the function in the same cell where it is used resolves the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to