[ 
https://issues.apache.org/jira/browse/TOREE-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandr Shames updated TOREE-560:
----------------------------------
    Description: 
*Description:*
When using a function defined in one cell and applying it to an RDD 
transformation in another cell, Toree fails to serialize the function 
correctly. This happens even with simple functions and RDDs.

 

*Disclaimer:*
_This is my first bug report, so I apologize if I missed any details or made 
mistakes. I'm still learning, and any feedback is appreciated! :)_

 

*Steps to Reproduce:*
 # Define a function in one cell:
{{val func = (x: Int) => x}}
 # In another cell, apply this function to an RDD:
sc.parallelize(List(1,2,3)).map \{ (s) => (func(s)) } // sc is the SparkContext

 

*Expected Behavior:*
The RDD transformation should work correctly, and the function should be 
serialized and executed on the executors.

*Actual Behavior:*
An error occurs, indicating a serialization issue. The function is not properly 
serialized when defined in a separate cell.
Stacktrace is attached in stacktrace.txt

*Workaround:*
Defining the function in the same cell where it is used resolves the issue

  was:
*Description:*
When using a function defined in one cell and applying it to an RDD 
transformation in another cell, Toree fails to serialize the function 
correctly. This happens even with simple functions and RDDs.

 

*Disclaimer:*
_This is my first bug report, so I apologize if I missed any details or made 
mistakes. I'm still learning, and any feedback is appreciated! :)_

 

*Steps to Reproduce:*
 # Define a function in one cell:
{{val func = (x: Int) => x}}
 # In another cell, apply this function to an RDD:
sc.parallelize(List(1,2,3)).map \{ (s) => (func(s)) }
{{// Here, sc}} is the SparkContext.

 

*Expected Behavior:*
The RDD transformation should work correctly, and the function should be 
serialized and executed on the executors.

*Actual Behavior:*
An error occurs, indicating a serialization issue. The function is not properly 
serialized when defined in a separate cell.
Stacktrace is attached in stacktrace.txt

*Workaround:*
Defining the function in the same cell where it is used resolves the issue


> Serialization issue when using a function defined in another cell with RDD 
> transformations in Toree
> ---------------------------------------------------------------------------------------------------
>
>                 Key: TOREE-560
>                 URL: https://issues.apache.org/jira/browse/TOREE-560
>             Project: TOREE
>          Issue Type: Bug
>          Components: Kernel
>         Environment: Spark 3.4.3
> Scala 2.12.18
>            Reporter: Alexandr Shames
>            Priority: Minor
>         Attachments: stacktrace.txt
>
>
> *Description:*
> When using a function defined in one cell and applying it to an RDD 
> transformation in another cell, Toree fails to serialize the function 
> correctly. This happens even with simple functions and RDDs.
>  
> *Disclaimer:*
> _This is my first bug report, so I apologize if I missed any details or made 
> mistakes. I'm still learning, and any feedback is appreciated! :)_
>  
> *Steps to Reproduce:*
>  # Define a function in one cell:
> {{val func = (x: Int) => x}}
>  # In another cell, apply this function to an RDD:
> sc.parallelize(List(1,2,3)).map \{ (s) => (func(s)) } // sc is the 
> SparkContext
>  
> *Expected Behavior:*
> The RDD transformation should work correctly, and the function should be 
> serialized and executed on the executors.
> *Actual Behavior:*
> An error occurs, indicating a serialization issue. The function is not 
> properly serialized when defined in a separate cell.
> Stacktrace is attached in stacktrace.txt
> *Workaround:*
> Defining the function in the same cell where it is used resolves the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to