[ https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-27629: ------------------------------------ Assignee: Apache Spark > Prevent Unpickler from intervening each unpickling > -------------------------------------------------- > > Key: SPARK-27629 > URL: https://issues.apache.org/jira/browse/SPARK-27629 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.0 > Reporter: Liang-Chi Hsieh > Assignee: Apache Spark > Priority: Major > > In SPARK-27612, one correctness issue was reported. When protocol 4 is used > to pickle Python objects, we found that unpickled objects were wrong. A > temporary fix was proposed by not using highest protocol. > It was found that Opcodes.MEMOIZE was appeared in the opcodes in protocol 4. > It is suspect to this issue. > A deeper dive found that Opcodes.MEMOIZE stores objects into internal map of > Unpickler object. We use single Unpickler object to unpickle serialized > Python bytes. Stored objects intervenes next round of unpickling, if the map > is not cleared. > We has two options: > 1. Continues to reuse Unpickler, but calls its close after each unpickling. > 2. Not to reuse Unpickler and create new Unpickler object in each unpickling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org