[ 
https://issues.apache.org/jira/browse/SPARK-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7873.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

> Serializer re-use + Kryo autoReset disabled leads to AraryIndexOutOfBounds 
> exception in sort-shuffle bypassMergeSort path
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7873
>                 URL: https://issues.apache.org/jira/browse/SPARK-7873
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.4.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>             Fix For: 1.4.0
>
>
> This is a somewhat obscure bug, but I think that it will seriously impact 
> KryoSerializer users who use custom registrators which disabled auto-reset.  
> When auto-reset is disabled, then this breaks things in some of our shuffle 
> paths which actually end up creating multiple OutputStreams from the same 
> shared SerializerInstance (which is unsafe).  To illustrate this, the 
> following test fails in 1.4:
> {code}
> class KryoSerializerAutoResetDisabledSuite extends FunSuite with 
> SharedSparkContext {
>   conf.set("spark.serializer", classOf[KryoSerializer].getName)
>   conf.set("spark.kryo.registrator", 
> classOf[RegistratorWithoutAutoReset].getName)
>   test("sort-shuffle with bypassMergeSort") {
>     val myObject = ("Hello", "World")
>     
> assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet 
> === Set(myObject))
>   }
> }
> {code}
> This was introduced by a patch (SPARK-3386) which enables serializer re-use 
> in some of the shuffle paths, since constructing new serializer instances is 
> actually pretty costly for KryoSerializer.  We had already fixed another 
> corner-case (SPARK-7766) bug related to this, but missed this one.  From an 
> engineering risk management perspective, we probably should have just 
> reverted the original serializer reuse patch and added a big 
> cross-product-of-configurations-and-shuffle-managers test suite before 
> attempting to fix the defects.
> I think that I have a pretty simple fix for this, but we still might want to 
> consider a revert for 1.4 just to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to