Josh Rosen created SPARK-7766:
---------------------------------

             Summary: KryoSerializerInstance re-use is not safe when auto-flush 
is disabled
                 Key: SPARK-7766
                 URL: https://issues.apache.org/jira/browse/SPARK-7766
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.4.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen
            Priority: Blocker


SPARK-3386 modified the shuffle write path to re-use serializer instances 
across multiple calls to DiskBlockObjectWriter.  It turns out that this 
introduced a very rare bug when using KryoSerializer: if auto-reset is disabled 
and reference-tracking is enabled, then we'll end up re-using the same 
serializer instance to write multiple output streams without calling 
{{reset()}} between write calls, which can lead to cases where objects in one 
file may contain references to objects that are in previous files, which can 
cause errors during deserialization.

The fix should be simple: add {{reset}} calls at the end of {{serialize}} and 
{{serializeStream}}.

Thanks to John Carrino for reporting this issue on GItHub: 
https://github.com/apache/spark/pull/5606#issuecomment-103995103



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to