Let me try to summarize the issue, as I see it.
Circling back to David's original quote from the Serialization spec [1]
"The class's writeObject method, if implemented, is responsible for
saving the state of the class. Either ObjectOutputStream's
defaultWriteObject or writeFields method must be called once (and only
once) before writing any optional data that will be needed by the
corresponding readObject method to restore the state of the object; even
if no optional data is written, defaultWriteObject or writeFields must
still be invoked once. If defaultWriteObject or writeFields is not
invoked once prior to the writing of optional data (if any), then the
behavior of instance deserialization i_s undefined in cases where the
ObjectInputStream cannot resolve the class which defined the writeObject
method in question._"
The underlying section above is most relevant. It is a qualification of
the scenario where the behavior is undefined. I read it to mean; the
behavior is undefined if, and only if, the OIS cannot resolve the class
which defined the writeObject. And this seems in line with David's
description [2] (which I agree).
"I think the specifics of the quote relate to this kind of class change;
in particular, if a class is deleted from the hierarchy on the read
side, and that class corresponds to the class that had the misbehaving
writeObject, I suspect that things will break at that point as the read
side will probably try to consume and discard the field information for
that class, which will be missing (it will start reading the next class'
fields instead I think)."
My take on this is that the above writeObject undefined qualification is
referring to a compatibility issue. Since removing a class from the
hierarchy is a compatible change [3], then default read/write
Object/Fields must be called, otherwise, if a class is removed from the
hierarchy the behavior is undefined. In my testing I get
StreamCorruptException, but I can see how this could behave differently,
depending on the class hierarchy and actual serialization state.
If the class defining the writeObject is resolvable, then the behavior
is *not* undefined.
Do we agree on what is actually undefined, and what is not?
-Chris.
[1]
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/output.html#861
[2]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-February/025069.html
[3]
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html#6754
On 17/02/14 07:17, Stuart Marks wrote:
On 2/14/14 9:43 AM, David M. Lloyd wrote:
On 02/14/2014 09:56 AM, David M. Lloyd wrote:
In the JDK, java.util.Date does not read/write fields. Perhaps others
as well. Given that the behavior is presently undefined, that means
the
serialized representation of java.util.Date (and any other such
non-conforming classes) are also undefined.
An interesting detail here - since Date doesn't have any
non-transient fields,
this happens to work out OK for a second reason (that
defaultReadFields() would
read nothing anyway) - however it still would break if a
non-transient field
were to be added.
Hi David,
(coming late to this party)
Thanks for pointing out these clauses in the serialization
specification. I always knew that these methods "should" behave this
way but I was unaware of the undefined qualification in the spec, and
I was also unaware that even JDK classes like java.util.Date have
readObject/writeObject methods that don't fulfil this requirement.
I also think you're right that these problems are widespread. A recent
blog post on serialization [1] has some sample code whose
readObject/writeObject methods don't fulfil this requirement either.
On the other hand, this requirement doesn't seem to appear in the
javadoc anyplace that I can find. The class doc for
java.io.Serializable is the most explicit, and it says,
The writeObject method is responsible for writing the state of the
object for its particular class so that the corresponding readObject
method can restore it. The default mechanism for saving the Object's
fields can be invoked by calling out.defaultWriteObject. The method
does not need to concern itself with the state belonging to its
superclasses or subclasses. State is saved by writing the individual
fields to the ObjectOutputStream using the writeObject method or by
using the methods for primitive data types supported by DataOutput.
The readObject method is responsible for reading from the stream and
restoring the classes fields. It may call in.defaultReadObject to
invoke the default mechanism for restoring the object's non-static
and non-transient fields. The defaultReadObject method uses
information in the stream to assign the fields of the object saved
in the stream with the correspondingly named fields in the current
object. This handles the case when the class has evolved to add new
fields. The method does not need to concern itself with the state
belonging to its superclasses or subclasses. State is saved by
writing the individual fields to the ObjectOutputStream using the
writeObject method or by using the methods for primitive data types
supported by DataOutput.
The wording here seems to imply that calling defaultWriteObject and
defaultReadObject is optional.
It does look like the various bits of the specification could use some
cleanup.
In your initial post, you said that problems with the serialization
specification that have caused user problems. Can you be more specific
about what these problems were?
In another message earlier in this thread, you had made a few
suggestions:
1) do nothing :(
2) start throwing (or writing) an exception in write/readObject when
stream ops are performed without reading fields (maybe can be
disabled with a sys prop or something)
3) leave fields cleared and risk protocol issues
4) silently start reading/writing empty field information (risks
protocol issues)
I'd have to say that #2 is pretty close to a non-starter. Since the
problem does appear to be widespread, a lot of software would start
suffering this exception even if it otherwise seems to be behaving
correctly. This is clearly a big behavioral incompatibility, and even
if it could be mitigated with a system property, I'd question whether
it was worthwhile.
#4 also seems to be a fairly large incompatibility. If a class's
writeObject method is missing a defaultWriteObject call, it has a
fairly stable behavior, although one that's defined by the
implementation as opposed to the specification. (Although the
specification isn't self-consistent, per the above.) Silently changing
the bytes emitted in these cases would certainly cause
incompatibilities with existing readObject methods that are unprepared
to deal with them.
#3 leads me to mention another area of the serialization specification
that *is* well-defined, which is what occurs if fields are added or
removed from one object version to the next. This is covered in
sections 5.6.1 and 5.6.2 of the spec. [2] Briefly, if the current
object has fields for which values are not present in the
serialization stream, those fields are initialized to their default
values (zero, null, false). Does this have any bearing on the issues
you're concerned about? (It doesn't say so very explicitly, but field
data that appears in the serialized form is ignored if there is no
corresponding field in the current class.)
Finally, another suggestion that might help with these issues is not
to change the JDK, but to use static analysis tools such as FindBugs
to help programmers identify problem code.
s'marks
[1]
http://marxsoftware.blogspot.com/2014/02/serializing-java-objects-with-non.html
[2]
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html#5172