[ 
https://issues.apache.org/jira/browse/SPARK-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-7873:
------------------------------
    Description: 
This is a somewhat obscure bug, but I think that it will seriously impact 
KryoSerializer users who use custom registrators which disabled auto-reset.  
When auto-reset is disabled, then this breaks things in some of our shuffle 
paths which actually end up creating multiple OutputStreams from the same 
shared SerializerInstance (which is unsafe).  To illustrate this, the following 
test fails in 1.4:

{code}
class KryoSerializerAutoResetDisabledSuite extends FunSuite with 
SharedSparkContext {
  conf.set("spark.serializer", classOf[KryoSerializer].getName)
  conf.set("spark.kryo.registrator", 
classOf[RegistratorWithoutAutoReset].getName)

  test("sort-shuffle with bypassMergeSort") {
    val myObject = ("Hello", "World")
    
assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet 
=== Set(myObject))
  }
}
{code}

This was introduced by a patch (SPARK-3386) which enables serializer re-use in 
some of the shuffle paths, since constructing new serializer instances is 
actually pretty costly for KryoSerializer.  We had already fixed another 
corner-case (SPARK-7766) bug related to this, but missed this one.  From an 
engineering risk management perspective, we probably should have just reverted 
the original serializer reuse patch and added a big 
cross-product-of-configurations-and-shuffle-managers test suite before 
attempting to fix the defects.

I think that I have a pretty simple fix for this, but we still might want to 
consider a revert for 1.4 just to be safe.

  was:
This is a somewhat obscure bug, but I think that it will seriously impact 
KryoSerializer users who use custom registrators which disabled auto-reset.  
When auto-reset is disabled, then this breaks things in some of our shuffle 
paths which actually end up creating multiple OutputStreams from the same 
shared SerializerInstance (which is unsafe).  To illustrate this, the following 
test fails in 1.4:

{code}
class KryoSerializerAutoResetDisabledSuite extends FunSuite with 
SharedSparkContext {
  conf.set("spark.serializer", classOf[KryoSerializer].getName)
  conf.set("spark.kryo.registrator", 
classOf[RegistratorWithoutAutoReset].getName)

  test("sort-shuffle with bypassMergeSort") {
    val myObject = ("Hello", "World")
    
assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet 
=== Set(myObject))
  }
}
{code}

This was introduced by a patch which enables serializer re-use in some of the 
shuffle paths, since constructing new serializer instances is actually pretty 
costly for KryoSerializer.  We had already fixed another corner-case bug 
related to this, but missed this one.  From an engineering risk management 
perspective, we probably should have just reverted the original serializer 
reuse patch and added a big 
cross-product-of-configurations-and-shuffle-managers test suite before 
attempting to fix the defects.

I think that I have a pretty simple fix for this, but we still might want to 
consider a revert for 1.4 just to be safe.


> Serializer re-use + Kryo autoReset disabled leads to AraryIndexOutOfBounds 
> exception in sort-shuffle bypassMergeSort path
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-7873
>                 URL: https://issues.apache.org/jira/browse/SPARK-7873
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.4.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>
> This is a somewhat obscure bug, but I think that it will seriously impact 
> KryoSerializer users who use custom registrators which disabled auto-reset.  
> When auto-reset is disabled, then this breaks things in some of our shuffle 
> paths which actually end up creating multiple OutputStreams from the same 
> shared SerializerInstance (which is unsafe).  To illustrate this, the 
> following test fails in 1.4:
> {code}
> class KryoSerializerAutoResetDisabledSuite extends FunSuite with 
> SharedSparkContext {
>   conf.set("spark.serializer", classOf[KryoSerializer].getName)
>   conf.set("spark.kryo.registrator", 
> classOf[RegistratorWithoutAutoReset].getName)
>   test("sort-shuffle with bypassMergeSort") {
>     val myObject = ("Hello", "World")
>     
> assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet 
> === Set(myObject))
>   }
> }
> {code}
> This was introduced by a patch (SPARK-3386) which enables serializer re-use 
> in some of the shuffle paths, since constructing new serializer instances is 
> actually pretty costly for KryoSerializer.  We had already fixed another 
> corner-case (SPARK-7766) bug related to this, but missed this one.  From an 
> engineering risk management perspective, we probably should have just 
> reverted the original serializer reuse patch and added a big 
> cross-product-of-configurations-and-shuffle-managers test suite before 
> attempting to fix the defects.
> I think that I have a pretty simple fix for this, but we still might want to 
> consider a revert for 1.4 just to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to