stevedlawrence commented on PR #1286:
URL: https://github.com/apache/daffodil/pull/1286#issuecomment-2334007707
I've done a bunch more research an experiments to figure out what's going on.
---
TDLR; Even though some zero-length arrays are separately allocated and do
not have reference equality, Java serializes them in a way that makes them the
same reference. This is very consistent with one particular array (delimiters
member in DelimiterStackParser) but not 100%. And I cannot ever reproduce this
behavior (non-reference equal things serialized to reference equal things) in
any other cases, even with arrays of the same type.
---
I wrote a test scala file that serialized an object containing zero length
arrays in different ways (i.e sometimes have reference equaltiy, sometimes
uniquely allocated and not having reference equality) , looked that
serialization dump, and confirmed that my interpretation was correct. An array
that has reference equality to a previously serialized array looks like this:
```
fieldName
(array)
TC_REFERENCE - 0x71
Handle - 8258634 - 0x00 7e 04 4a
```
But an array does not have reference equality serializes to this:
```
fieldName
(array)
TC_ARRAY - 0x75
TC_REFERENCE - 0x71
Handle - 8258633 - 0x00 7e 04 49
newHandle 0x00 7e 04 c6
Array size - 0 - 0x00 00 00 00
Values
```
In the first case the TC_REFERENCE is to the pre-serialized array, in the
second case the TC_REFERENCE is to the type since the type is the same, but
it's a completely separate array.
What's confusing is that I've confirmed that none of the serialized
delimiter arrays have reference equality when they are created. They are all
separately allocated arrays. But when inspecting the serialized dump, the zero
length arrays do have reference equality. I've also confirmed that when
deserializing the arrays do have reference equality. I've also confirmed the
fields do not have reference equality when the serialization function
writeObject() is called. So it must be somewhere in the actual serialization
code (i.e. ObjectOutputStream.defaultWriteObject) doing this optimization.
What's also weird is I add a new (unused) member to a non-delimiter parser
and allocated zero-length `Array[DelimterParseEv]` and those were not changed
at all during serialization. They all stayed separately allocated arrays
without reference equality. So whatever is doing this optimization doesn't do
it in this case. But it almost always does it for the delimiters field in the
`DelimiterStackParser`, but not 100%.
Seems like some kind of deep Java optimization to save some bytes during
serialization, but it is not very consistent. I'm also surprised Java is even
allowed to do that. Changing something to have reference equality seems like it
could potentially break things if something relied things having meaning if
they when not reference equal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]