[ 
https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27629:
------------------------------------

    Assignee: Apache Spark

> Prevent Unpickler from intervening each unpickling
> --------------------------------------------------
>
>                 Key: SPARK-27629
>                 URL: https://issues.apache.org/jira/browse/SPARK-27629
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Liang-Chi Hsieh
>            Assignee: Apache Spark
>            Priority: Major
>
> In SPARK-27612, one correctness issue was reported. When protocol 4 is used 
> to pickle Python objects, we found that unpickled objects were wrong. A 
> temporary fix was proposed by not using highest protocol.
> It was found that Opcodes.MEMOIZE was appeared in the opcodes in protocol 4. 
> It is suspect to this issue.
> A deeper dive found that Opcodes.MEMOIZE stores objects into internal map of 
> Unpickler object. We use single Unpickler object to unpickle serialized 
> Python bytes. Stored objects intervenes next round of unpickling, if the map 
> is not cleared.
> We has two options:
> 1. Continues to reuse Unpickler, but calls its close after each unpickling.
> 2. Not to reuse Unpickler and create new Unpickler object in each unpickling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to