Josh Rosen created SPARK-7873:
---------------------------------
Summary: Serializer re-use + Kryo autoReset disabled leads to
AraryIndexOutOfBounds exception in sort-shuffle bypassMergeSort path
Key: SPARK-7873
URL: https://issues.apache.org/jira/browse/SPARK-7873
Project: Spark
Issue Type: Bug
Components: Shuffle, Spark Core
Affects Versions: 1.4.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Blocker
This is a somewhat obscure bug, but I think that it will seriously impact
KryoSerializer users who use custom registrators which disabled auto-reset.
When auto-reset is disabled, then this breaks things in some of our shuffle
paths which actually end up creating multiple OutputStreams from the same
shared SerializerInstance (which is unsafe). To illustrate this, the following
test fails in 1.4:
{code}
class KryoSerializerAutoResetDisabledSuite extends FunSuite with
SharedSparkContext {
conf.set("spark.serializer", classOf[KryoSerializer].getName)
conf.set("spark.kryo.registrator",
classOf[RegistratorWithoutAutoReset].getName)
test("sort-shuffle with bypassMergeSort") {
val myObject = ("Hello", "World")
assert(sc.parallelize(Seq.fill(100)(myObject)).repartition(2).collect().toSet
=== Set(myObject))
}
}
{code}
This was introduced by a patch which enables serializer re-use in some of the
shuffle paths, since constructing new serializer instances is actually pretty
costly for KryoSerializer. We had already fixed another corner-case bug
related to this, but missed this one. From an engineering risk management
perspective, we probably should have just reverted the original serializer
reuse patch and added a big
cross-product-of-configurations-and-shuffle-managers test suite before
attempting to fix the defects.
I think that I have a pretty simple fix for this, but we still might want to
consider a revert for 1.4 just to be safe.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]