[
https://issues.apache.org/jira/browse/TOREE-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandr Shames updated TOREE-560:
----------------------------------
Attachment: stacktrace.txt
> Serialization issue when using a function defined in another cell with RDD
> transformations in Toree
> ---------------------------------------------------------------------------------------------------
>
> Key: TOREE-560
> URL: https://issues.apache.org/jira/browse/TOREE-560
> Project: TOREE
> Issue Type: Bug
> Components: Kernel
> Environment: Spark 3.4.3
> Scala 2.12.18
> Reporter: Alexandr Shames
> Priority: Minor
> Attachments: stacktrace.txt
>
>
> *Description:*
> When using a function defined in one cell and applying it to an RDD
> transformation in another cell, Toree fails to serialize the function
> correctly. This happens even with simple functions and RDDs.
>
> *Disclaimer:*
> _This is my first bug report, so I apologize if I missed any details or made
> mistakes. I'm still learning, and any feedback is appreciated! :)_
>
> *Steps to Reproduce:*
> # Define a function in one cell:
> {{val func = (x: Int) => x}}
> # In another cell, apply this function to an RDD:
> {{{\{sc.parallelize(List(1,2,3)).map{ (s) => (func(s)) }}}}}
> {{// (Here, {{sc}} is the SparkContext.)}}
>
> *Expected Behavior:*
> The RDD transformation should work correctly, and the function should be
> serialized and executed on the executors.
> *Actual Behavior:*
> An error occurs, indicating a serialization issue. The function is not
> properly serialized when defined in a separate cell.
> Stacktrace is attached in stacktrace.txt
> *Workaround:*
> Defining the function in the same cell where it is used resolves the issue
--
This message was sent by Atlassian Jira
(v8.20.10#820010)