[
https://issues.apache.org/jira/browse/TOREE-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandr Shames updated TOREE-560:
----------------------------------
Description:
*Description:*
When using a function defined in one cell and applying it to an RDD
transformation in another cell, Toree fails to serialize the function
correctly. This happens even with simple functions and RDDs.
*Disclaimer:*
_This is my first bug report, so I apologize if I missed any details or made
mistakes. I'm still learning, and any feedback is appreciated! :)_
*Steps to Reproduce:*
# Define a function in one cell:
{{val func = (x: Int) => x}}
# In another cell, apply this function to an RDD:
sc.parallelize(List(1,2,3)).map \{ (s) => (func(s)) } // sc is the SparkContext
*Expected Behavior:*
The RDD transformation should work correctly, and the function should be
serialized and executed on the executors.
*Actual Behavior:*
An error occurs, indicating a serialization issue. The function is not properly
serialized when defined in a separate cell.
Stacktrace is attached in stacktrace.txt
*Workaround:*
Defining the function in the same cell where it is used resolves the issue
was:
*Description:*
When using a function defined in one cell and applying it to an RDD
transformation in another cell, Toree fails to serialize the function
correctly. This happens even with simple functions and RDDs.
*Disclaimer:*
_This is my first bug report, so I apologize if I missed any details or made
mistakes. I'm still learning, and any feedback is appreciated! :)_
*Steps to Reproduce:*
# Define a function in one cell:
{{val func = (x: Int) => x}}
# In another cell, apply this function to an RDD:
sc.parallelize(List(1,2,3)).map \{ (s) => (func(s)) }
{{// Here, sc}} is the SparkContext.
*Expected Behavior:*
The RDD transformation should work correctly, and the function should be
serialized and executed on the executors.
*Actual Behavior:*
An error occurs, indicating a serialization issue. The function is not properly
serialized when defined in a separate cell.
Stacktrace is attached in stacktrace.txt
*Workaround:*
Defining the function in the same cell where it is used resolves the issue
> Serialization issue when using a function defined in another cell with RDD
> transformations in Toree
> ---------------------------------------------------------------------------------------------------
>
> Key: TOREE-560
> URL: https://issues.apache.org/jira/browse/TOREE-560
> Project: TOREE
> Issue Type: Bug
> Components: Kernel
> Environment: Spark 3.4.3
> Scala 2.12.18
> Reporter: Alexandr Shames
> Priority: Minor
> Attachments: stacktrace.txt
>
>
> *Description:*
> When using a function defined in one cell and applying it to an RDD
> transformation in another cell, Toree fails to serialize the function
> correctly. This happens even with simple functions and RDDs.
>
> *Disclaimer:*
> _This is my first bug report, so I apologize if I missed any details or made
> mistakes. I'm still learning, and any feedback is appreciated! :)_
>
> *Steps to Reproduce:*
> # Define a function in one cell:
> {{val func = (x: Int) => x}}
> # In another cell, apply this function to an RDD:
> sc.parallelize(List(1,2,3)).map \{ (s) => (func(s)) } // sc is the
> SparkContext
>
> *Expected Behavior:*
> The RDD transformation should work correctly, and the function should be
> serialized and executed on the executors.
> *Actual Behavior:*
> An error occurs, indicating a serialization issue. The function is not
> properly serialized when defined in a separate cell.
> Stacktrace is attached in stacktrace.txt
> *Workaround:*
> Defining the function in the same cell where it is used resolves the issue
--
This message was sent by Atlassian Jira
(v8.20.10#820010)