On 5/18/16 2:42 AM, Stephen Colebourne wrote:
My original blog on the topic was in 2010:
http://blog.joda.org/2010/02/serialization-shared-delegates_9131.html

Bear in mind that a key reason for sharing the serialization proxy is
to share the "serialized object header, serial version UID, class
descriptor" etc. It is that header overhead that is the main reason
for serialization being so space inefficient on the wire. It is thus a
positive thing that "unrelated classes" share the proxy.

JSR-310 goes to great lengths to save bytes in the stream - see
LocalTime for example. IMO, it would be really good to see
serialization move to a single package-level shared proxy in java.util
as well, as it would dramatically reduce many stream sizes (as per the
blog post).
So, the key aspects of the pattern that I see are:
- shared between multiple classes
- use a flag (byte) to distinguish classes
- top level class with a short name
- externalizable, not serializable

The primary goal of the serialization proxy in this case is to prevent the concrete collections implementation classes from leaking into the serial format. Another major goal is to provide for backward *and* forward compatibility. Minimizing the serial stream size is nice, but is less important than compatibility.

I'd like to set aside this notion of a "single package-level shared proxy" for java.util. There are too many other unrelated things already in java.util that have their own serial formats that cannot be changed. It's too much of a blanket statement to say that "all new serializable things in java.util" should use a single proxy, since we have zero examples of this, and they potentially could have arbitrarily different requirements for their serial formats.

Future-proofing the serial proxy for future *collections* implementations in java.util is quite sensible, though.

JSR-310 chooses to delegate the actual logic back to the class itself,
but this is not required by the pattern. What CollSer does not do is
implement Externalizable. And as I've argued, I believe it is a *good*
thing to share a Ser class across a package (to overcome the
limitations of the ancient Serialization spec).

OK, it's good to know you don't consider the back-delegation to be required. It's a fairly prominent feature of the java.time classes. I was concerned that was what was being proposed here, and it would be a fairly intrusive change.

Anyway, I've done half the work for you ;-)
https://gist.github.com/jodastephen/2bb70e1f1180b030d46b5a6366c0a0c4

With a collection of 1 string, CollSer uses 136 bytes while my
Externalizable Ser uses 58.
With a collection of 3 strings, CollSer uses 171 bytes while my
Externalizable Ser uses 87.

These are the contents of the stream. As can be seen, the
Externalizable form avoids two java.lang.Object references.

CollSer:
136 
[ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]
171 
[ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]sq[0]~[0][0][0][0][0][1]uq[0]~[0][3][0][0][0][3]t[0][1]at[0][2]bbt[0][3]ccc

Ser:
58 
[ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]x
87 
[ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]xsq[0]~[0][0]w[5][1][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx

Interesting, nice testbed, thanks.

It turns out the main culprit here is the field "Object[] array" which is responsible for most of the bulk. (For others' edification, the serial stream contains descriptions of fields including the type names, which is "[Ljava/lang/Object;". And when the array is serialized, its class descriptor, including its name -- again -- is included.)

This suggests an easy way to reduce the bulk of the serial data, which is to make the Object[] field transient and to use custom serial data to write the array's length followed by its contents. (This is similar to what the other collections' serial forms do.)

After renaming the modified class "Se3" to match the length of the name "Ser", and renaming the "flags" field to "tag" to save a couple more bytes, running the serialization tester gets the following:

63 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.Se3W[8e][ab][b6]:[1b][a8][11][3][0][1]I[0][3]tagxpw[4][0][0][0][0]x 91 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.Se3W[8e][ab][b6]:[1b][a8][11][3][0][1]I[0][3]tagxpw[4][0][0][0][0]xsq[0]~[0][0]w[4][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx

This is only a handful of bytes larger than the Externalizable alternative.

While I understand the forward/backward compatibility argument of (int
& 0xff) however I'm unconvinced of the need. If things change, I don't
expect the new format to be loadable in an earlier JDK version.

Backward and forward serial compatibility has historically been an issue for serializable classes in the JDK. I'm not willing to shave off a few more bytes in order to compromise this.

s'marks

Reply via email to