[
https://issues.apache.org/jira/browse/SPARK-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen updated SPARK-7873:
------------------------------
Description:
This is a somewhat obscure bug, but I think that it will seriously impact
KryoSerializer users who use custom registrators which disabled auto-reset.
When auto-reset is disabled, then this breaks things in some of our shuffle
paths which actually end up creating multiple OutputStreams from the same
shared SerializerInstance (which is unsafe). To illustrate this, the following
test fails in 1.4:
{code}
class KryoSerializerAutoResetDisabledSuite extends FunSuite with
SharedSparkContext {
conf.set("spark.serializer", classOf[KryoSerializer].getName)
conf.set("spark.kryo.registrator",
classOf[RegistratorWithoutAutoReset].getName)
test("sort-shuffle with bypassMergeSort") {
val myObject = ("Hello", "World")
assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet
=== Set(myObject))
}
}
{code}
This was introduced by a patch (SPARK-3386) which enables serializer re-use in
some of the shuffle paths, since constructing new serializer instances is
actually pretty costly for KryoSerializer. We had already fixed another
corner-case (SPARK-7766) bug related to this, but missed this one. From an
engineering risk management perspective, we probably should have just reverted
the original serializer reuse patch and added a big
cross-product-of-configurations-and-shuffle-managers test suite before
attempting to fix the defects.
I think that I have a pretty simple fix for this, but we still might want to
consider a revert for 1.4 just to be safe.
was:
This is a somewhat obscure bug, but I think that it will seriously impact
KryoSerializer users who use custom registrators which disabled auto-reset.
When auto-reset is disabled, then this breaks things in some of our shuffle
paths which actually end up creating multiple OutputStreams from the same
shared SerializerInstance (which is unsafe). To illustrate this, the following
test fails in 1.4:
{code}
class KryoSerializerAutoResetDisabledSuite extends FunSuite with
SharedSparkContext {
conf.set("spark.serializer", classOf[KryoSerializer].getName)
conf.set("spark.kryo.registrator",
classOf[RegistratorWithoutAutoReset].getName)
test("sort-shuffle with bypassMergeSort") {
val myObject = ("Hello", "World")
assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet
=== Set(myObject))
}
}
{code}
This was introduced by a patch which enables serializer re-use in some of the
shuffle paths, since constructing new serializer instances is actually pretty
costly for KryoSerializer. We had already fixed another corner-case bug
related to this, but missed this one. From an engineering risk management
perspective, we probably should have just reverted the original serializer
reuse patch and added a big
cross-product-of-configurations-and-shuffle-managers test suite before
attempting to fix the defects.
I think that I have a pretty simple fix for this, but we still might want to
consider a revert for 1.4 just to be safe.
> Serializer re-use + Kryo autoReset disabled leads to AraryIndexOutOfBounds
> exception in sort-shuffle bypassMergeSort path
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-7873
> URL: https://issues.apache.org/jira/browse/SPARK-7873
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, Spark Core
> Affects Versions: 1.4.0
> Reporter: Josh Rosen
> Assignee: Josh Rosen
> Priority: Blocker
>
> This is a somewhat obscure bug, but I think that it will seriously impact
> KryoSerializer users who use custom registrators which disabled auto-reset.
> When auto-reset is disabled, then this breaks things in some of our shuffle
> paths which actually end up creating multiple OutputStreams from the same
> shared SerializerInstance (which is unsafe). To illustrate this, the
> following test fails in 1.4:
> {code}
> class KryoSerializerAutoResetDisabledSuite extends FunSuite with
> SharedSparkContext {
> conf.set("spark.serializer", classOf[KryoSerializer].getName)
> conf.set("spark.kryo.registrator",
> classOf[RegistratorWithoutAutoReset].getName)
> test("sort-shuffle with bypassMergeSort") {
> val myObject = ("Hello", "World")
>
> assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet
> === Set(myObject))
> }
> }
> {code}
> This was introduced by a patch (SPARK-3386) which enables serializer re-use
> in some of the shuffle paths, since constructing new serializer instances is
> actually pretty costly for KryoSerializer. We had already fixed another
> corner-case (SPARK-7766) bug related to this, but missed this one. From an
> engineering risk management perspective, we probably should have just
> reverted the original serializer reuse patch and added a big
> cross-product-of-configurations-and-shuffle-managers test suite before
> attempting to fix the defects.
> I think that I have a pretty simple fix for this, but we still might want to
> consider a revert for 1.4 just to be safe.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]