> On Feb 4, 2017, at 3:37 AM, Niclas Hedhman <nic...@hedhman.org> wrote:
> 
> Gregg,
> I know that you can manage to "evolve" the binary format if you are
> incredibly careful and not make mistakes. BUT, that seems really hard,
> since EVEN Sun/Oracle state that using Serilazation for "long live objects"
> are highly discouraged. THAT is a sign that it is not nearly as easy as you
> make it sound to be, and it is definitely different from XML/JSON as once
> the working codebase is lost (i.e. either literally lost (yes, I have been
> involved trying to restore that), or modified so much that compatibility
> broke, which happens when serialization is not the primary focus of a
> project) then you are pretty much screwed forever, unlike XML/JSON.

I think that there is some realistic issues as you describe here.  Certainly if 
the XML or JSON can be “read”, you can get some of the data out of it.  Java 
Serialization or any binary structure requires more knowledge to extra the 
“data” from.  I am not going to really argue that point other than to say that 
for sure, you have to understand the implications of this failure mode and do 
the right things up front so that you do have documentation, a documented 
serial version id plan etc.  Not impossible, but indeed additional “work”.

> 
> Now, you may say, that is for "long lived serialized states" but we are
> dealing with "short lived" ones. However, in today's architectures and
> platforms, almost no organization manages to keep all parts of a system
> synchronized when it comes to versioning. Different parts of a system is
> upgraded at different rates. And this is essentially the same as "long
> lived objects" ---  "uh this was serialized using LibA 1.1, LibB 2.3 and
> JRE 1.4, and we are now at LibA 4.6, LibB 3.1 and Java 8", do you see the
> similarity? If not, then I will not be able to convince you. If you do,
> then ask "why did Sun/Oracle state that long-lived objects with Java
> Serialization was a bad idea?", or were they also clueless on how to do it
> right, which seems to be your actual argument.

My actual argument is that “data” is “data”.  It doesn’t matter how it’s 
“structured”.  The only thing that JSON or XML has on “binary” is that you can 
“look” at it with your eyes and feel more comfortable with what you see.  If I 
typed the following two sets of byte sequences at you, what could you tell me 
about them?

00 01 00 00 00 06 01 03 00 00 00 02
00 01 00 00 00 06 01 04 42 28 00 00

In the right context, you could tell me that this is a ModbusTCP request for 
two holding registers, 40001 and 40002.  Further, you’d look at the reply 
packet and say, it looks like the returned two registers are a floating point 
number because the first byte is 42.  Further, you could tell me that the float 
point number itself is actually the value 42.0.

My point is that there is always context (a ModbusTCP conversation log), 
knowledge (I know Modbus like the back of my hand) and experience (I know what 
the general structure of IEEE floating point is and because I have stared at 
these byte streams when I knew there were float point numbers involved, I can 
recognize this).

Would I be faster to know what I was looking at, if I saw

        { “downhole_temp” : 42 }

instead, sure.  But, that “costs” bandwidth across my cellular modem link, and 
will further decrease the total number of requests I can send across that fixed 
bandwidth, if I just sent JSON instead of binary data.

My point is that it’s just data, but it satisfies another need I have, reducing 
bandwidth between the source of the data and the user of the data improves 
system performance.

Additionally, it is not “free” to marshal and unmarshal JSON or XML for use by 
an application.  I use large (100k or more) XML documents to “describe” the 
details of devices that use Modbus communications.  I do that because I can 
then use XSLT to transform them into HTML documents for human consumption to 
review these technical descriptions visually where it is easier to depict the 
details.  Thus 42280000 becomes something like { “downhole_temp” : 42 } to ease 
consumption for those who don’t have the training, experience and knowledge I 
have.

Java Serialization has adequate control points to manage evolution of the data 
in ways that are “evolution”.  You do have to understand precisely what the 
effect of you changes to the “data” in your object and how code referencing 
that “data” either directly or “functionally” can cope with what is going on.

It is important detail.  It does require training and experience.  You do have 
to understand some basic patterns for data evolution which will allow you to be 
successful rather than frustrated by your inexperience or lack of knowledge 
leading to failure.

A large majority of the web development and evolution happened because 
inexperienced people were left in charge of the new platform.  All the other 
software developers were already working on other things where they had 
experience and knowledge that was required.  The explosion of developer needs 
starting in 1995 for “HTML” developers was not a big deal.  But when Java 
applets appeared, suddenly you had to know a lot more than how to structure an 
HTML document.  Java’s Exception design caused people who had never had to deal 
with runtime problems like connection loss or mal-formatted data to be 
frustrated that there applets kept dying because they didn’t know that you 
needed to use

        while( !done ) {  try { …. } catch( RuntimeException ex ) { 
ex.printStackTrace(); } }

as the basis for any path of execution that interacted with externally provided 
data where strings might be not formatted correctly.

There are just so many intricate details to knowing how to be successful at 
software that interacts with “the world.”

I know the above can sound pretty arrogant and/or condescending in tone.  
That’s not my intent.  I am trying to describe my view of the complexities yet 
also demonstrate the common ground that I view “data” is “data” by.

The structure of the data requires “knowledge” by us and by the software 
involved.  The think that make JSON or XML convenient are the libraries that 
provide marshal and unmarshal activities.  If those didn’t exist, you would 
have to do an awful lot of development to create the lexer and parser to unpack 
the data.  How many times have you had to write a “string” quoter that would 
escape various characters so that you could write out strings as

        “{ \”downhole_temp\” : “+valueFor(downholeTemp)+” }”

for JSON marshaling?  You more often then not use the building in marshaling of 
the native objects with 

        JObject obj = new JObject();
        obj.add( “downhole_temp”, downholeTemp);
        …
        writeObjectReply( obj.toString() );

instead right?

Java serialization has exactly the same facilities for managing the transition 
from native objects to a “transportable” format and back.  That’s all that is 
required.  Are there other details to manage with Java serialization, besides 
“format”, yes, and that’s the commonality that it shares with JSON, XML or even 
Modbus as a protocol.  You have to know what is on the other end.  You have to 
plan for how to evolve what you are sending against anything that you might 
want to send.  Modbus provides evolution through the 7th byte (the RTU # or 
device address) in the above packet example.  If the value of that byte is 
>247, then it’s no longer standard modbus, but rather something else. There are 
many examples of other companies evolving Modbus for their devices because of 
limitations (many) of the original protocol.  Modbus was designed a long time 
ago, but even then, they understood that there were limits of what was possible 
with their design and provided an “out”.  Further, they took care of the 
“length” issue by including an inter character timeout of 4 1/2 character times 
on a serial port for ModbusRTU.  The above Modbus TCP packet includes the 
length of the data in the third word (bytes 5 and 6) of the packet so that 
after looking at that 7th byte and seeing that its not your address, you can 
ignore the packet.

Thus, data evolution and machine to machine communication evolution is not 
“new” stuff.  It’s been a natural part of system design for a long time.  We 
have the opportunity with Jini to allow for “varied” communication protocols 
because it’s software.  Can we create something that abstracts communication 
from the application?  It’s already there in the endpoint design.  That 
mechanism is how we can “vary” the transport of the data.  Can we also abstract 
the data format?  Can we have JSON from a web service end up as Java objects? 
That’s already there with the smart proxy mechanism.  Smart proxies can be 
manufactured by “proxy” services which just export an object as a service which 
knows how to use the web service and get JSON objects and make them into Java 
objects exposed through that smart proxy’s service interface design.

I am really interested in this conversation and all the experiences people have 
with not being successful with Jini.  I want to try and expand the problems 
into either missing knowledge of how you might use Jini/Java differently or 
into real problems that we need to solve with architecture evolution.

All of the things that I’ve tied to push into the Jini community as solutions 
have been meant to start conversations and show where I’ve encountered friction 
which I felt software changes to the platform could alleviate better than an 
external work around/solution.

Gregg

> 
> And I think (purely speculative) that many people saw exactly this problem
> quite early on, whereas myself I was at the time mostly in relatively small
> confined and controlled environments, where up-to-date was managed. And
> took me much longer to realize the downsides that are inherent.
> 
> Cheers
> Niclas
> 
> On Sat, Feb 4, 2017 at 3:35 PM, Gregg Wonderly <ge...@cox.net> wrote:
> 
>> 
>>> On Feb 3, 2017, at 8:43 PM, Niclas Hedhman <nic...@hedhman.org> wrote:
>>> 
>>> On Fri, Feb 3, 2017 at 12:23 PM, Peter <j...@zeus.net.au> wrote:
>>> 
>>>> 
>>>> No serialization or Remote method invocation framework currently
>> supports
>>>> OSGi very well, one that works well and can provide security might gain
>> a
>>>> lot of new interest from that user base.
>>> 
>>> 
>>> What do you mean by this? Jackson's ObjectMapper doesn't have problems on
>>> OSGi. You are formulating the problem wrongly, and if formulated
>> correctly,
>>> perhaps one realizes why Java Serialization fell out of fashion rather
>>> quickly 10-12 years ago, when people realized that code mobility (as done
>>> in Java serialization/RMI) caused a lot of problems.
>> 
>> I’ve seen and heard of many poorly designed pieces of software.  But, the
>> serialization for Java has some very easily managed details which can
>> trivially allow you to be 100% successful with the use of Serialization.
>> I’ve never encountered problems with serialization.  I learned early on
>> about using explicit versioning for any serialization format, and then
>> providing evolution based changes instead of replacement based changes.  It
>> takes some experience and thought for sure.  But, in the end, it’s really
>> no different from using JSON, XML or anything else.  The format of what you
>> send has to be able to change, the content which must remain in a
>> compatible way has to remain accessible in the same way.  I really am
>> saddened by the thought that so many people never learn about binary
>> structured data in their classes or through materials they might read to
>> learn about such things.
>> 
>> What generally happens is that people forget to design extensibility into
>> their data systems, and then end up with all kinds of problems.   Here’s
>> some of the rules I always try to follow.
>> 
>> 1. Remote interfaces should almost always pass non native type objects
>> that wrap the data needed.  This will make sure you can seamlessly add more
>> data without changing method signatures.
>> 2. Always put a serial version id on your serialized classes.  Start with
>> 1, and increment it as you make changes by more than just ‘1’.
>> 3. When you are going to add a new value, think about how you can make
>> that independent of existing serialized data.  For example, when you
>> override readObject or writeObject methods, how will you make sure that
>> those methods can cast the data for “this” version of the data without
>> breaking past or future versions of the object.
>> 4. Data values inside of serialized classes should be carefully designed
>> so that there is a “not present” value that is in line with a “not
>> initialized” value so that you can always insert a new format in between
>> those two (see rule 2 above about leaving holes in the versions).
>> 
>> The purpose of serializing objects is so that you can also send the
>> correct code.  If you can’t send the correct code (you are just sending
>> JSON), and instead have to figure out how to make your new data compatible
>> with code that can’t change, how is that any less complex than designing
>> readObject and writeObject implementations that must do the same thing when
>> you load an old serialization of an object into a new version of the
>> object?  In this case, readObject() needs to be able to inspect the new
>> values that the new code uses in readObject and provide initial values for
>> them just like the constructor(s) would do if the object was created new.
>> 
>> I really have never found anything that shipping JSON around makes any
>> simpler.   You still have to have a parsable JSON string value.  You still
>> have to migrate data formats when their is an old object receive by new
>> code.
>> 
>> The biggest problem of old was people not using an explicit serial version
>> id.  Several times, I have had to add an explicit serial version id to old
>> code so that it would deserialize correctly into new classes.  Sometimes it
>> is hard to do that.  But, that’s not a problem with the system as much as
>> it is a lack of understanding or actual neglect in following the design
>> standards of the serialization process.
>> 
>> Gregg
>> 
>>> 
>>> IMHO, RMI/Serialization's design is flawed. Mixing too many concerns in
>> the
>>> same abstraction; sandboxing w/ integration , code mobility, class
>>> resolution, versioning and deserialization, with very little hooks to
>>> cusomize any or all of these aspects. And these aspects should not have
>>> been wrapped into one monolith.
>>> 
>>> Further, I think the only "sane" approach in a OSGi environment is to
>>> create a new bundle for the Remote environment, all codebases not part of
>>> the API goes into that bundle and that the API is required to be present
>> in
>>> the OSGi environment a priori. I.e. treat the Remote objects in OSGi as
>> it
>>> is treated in plain Java; one classloader, one chunk, sort out its own
>>> serialization woes. Likewise for the server; treat it as ordinary RMI,
>>> without any mumbo-jambo OSGi stuff to be figured out at a
>> non-OSGi-running
>>> JVM. An important difference is that in OSGi, the BundleClassLoader is
>> not
>>> (required to be) a URLClassLoader, so the Java serialization's auto
>>> annotation of globally reachable URLs won't work, and one need to rely on
>>> java.rmi.server.codebase property, but a bundle could watch for loaded
>>> bundles and build that up for URLs that can be resolved globally.
>>> 
>>> 
>>> Cheers
>>> --
>>> Niclas Hedhman, Software Developer
>>> http://polygene.apache.org <http://zest.apache.org> - New Energy for
>> Java
>> 
>> 
> 
> 
> -- 
> Niclas Hedhman, Software Developer
> http://polygene.apache.org <http://zest.apache.org> - New Energy for Java

Reply via email to