stevedlawrence commented on pull request #505:
URL: https://github.com/apache/daffodil/pull/505#issuecomment-829218933
Finally figured out what's causing this one test to fail. It's actually
caused by 2.12.12, though I'm not sure what changed from 2.12.11 to 2.12.12 to
cause it to happen, but at least I know why it's failing, and some potential
workarounds. Here's a really simple way to reproduce the failure:
```scala
// Thing contains list of other Things
class Thing() extends Serializable {
var things: Seq[Thing] = _
}
object Test {
def main(args: Array[String]): Unit = {
// Create two things
val a = new Thing()
val b = new Thing()
// Put thing B in a list
val l = Seq(b)
// Set that list to the things list in both A and B
a.things = l
b.things = l
// Serialize and Deserialize Thing A
val baos = new java.io.ByteArrayOutputStream()
val oos = new java.io.ObjectOutputStream(baos)
oos.writeObject(a)
oos.close()
baos.close()
val bytes = baos.toByteArray()
val bais = new java.io.ByteArrayInputStream(bytes)
val ois = new java.io.ObjectInputStream(bais)
val newThingA = ois.readObject().asInstanceOf[Thing]
}
}
```
So we have a list containing thing B, an both A and B have a reference to
that list. And then we try to serialize and deserialize A.
Unfortunately, this fails on the last line when trying to deserialize with
the error:
> java.lang.ClassCastException: cannot assign instance of
scala.collection.immutable.List$SerializationProxy to field Thing.things of
type scala.collection.Seq in instance of Thing
This happens because of the way that ``List``'s are serialized. Rather than
serializing the ``List`` directly, Scala instead serializes a
``SerializationProxy``. When this is deserializes, Java first deserializes the
``SerializationProxy`` and then Scala converts that deserialized object back to
a List. This issue is that with Java's default deserialization, extra
deserialization stuff goes on in between this the deserialization of the proxy
and converting it to a List, which causes issues. Below is a summary of the
steps that happen when we try to deserialize A:
1. Deserialize A
1. Deserialize A.things
1. Create a SerializationProxy for A.things
1. Add the new SerializationProxy to a reference lookup table
1. Recursively deserialize the list contents
1. Deserialize B
1. Deserialize B.things
1. B.things is actually the same list reference that we added
to the reference lookup table, so just set B.things to the value in that table,
nothing extra to deserialize
1. B is done
1. The list contents are deserialized, build a new List from the
deserialized SerializationProxy
1. Set A.things to that new List
1. Update the referenece lookup table to have the new List instead of
the SerializationProxy
1. A.things is done
1. A is done
So the issue is that when B.things is set to the value in the lookup table,
it is set to the SerializtionProxy beause it hasn't been replaced with the real
List yet. This is where things fail because B.things is expecting to be set to
a List and not a SerializationProxy, which is exactly what the error is saying.
This same kind of recursive back reference with shared Lists is exactly
what's going on with the DPathElementCompileInfo (e.g ``Thing``) and the
elementCompileInfos member (e.g. ``things``). We could maybe write custom
serialization code to fix this, but I'm not sure I really want to maintain
that. Maybe a simple alterantive is to just change elementCompileInfos so it
isn't a List, but is instead an Array. This way we avoid this whole
SerializationProxy stuff? Or I think we could clone the elementCompileInfos
List. That way there would be no references to the same List, at the expense of
extra memory usage.
Any other thoughts? Maybe we need to rethink the parent backpointer in
DPathCompileInfo, which I think is the root of the problem?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]