> On Feb 4, 2017, at 3:37 AM, Niclas Hedhman <nic...@hedhman.org> wrote: > > Gregg, > I know that you can manage to "evolve" the binary format if you are > incredibly careful and not make mistakes. BUT, that seems really hard, > since EVEN Sun/Oracle state that using Serilazation for "long live objects" > are highly discouraged. THAT is a sign that it is not nearly as easy as you > make it sound to be, and it is definitely different from XML/JSON as once > the working codebase is lost (i.e. either literally lost (yes, I have been > involved trying to restore that), or modified so much that compatibility > broke, which happens when serialization is not the primary focus of a > project) then you are pretty much screwed forever, unlike XML/JSON.
I think that there is some realistic issues as you describe here. Certainly if the XML or JSON can be “read”, you can get some of the data out of it. Java Serialization or any binary structure requires more knowledge to extra the “data” from. I am not going to really argue that point other than to say that for sure, you have to understand the implications of this failure mode and do the right things up front so that you do have documentation, a documented serial version id plan etc. Not impossible, but indeed additional “work”. > > Now, you may say, that is for "long lived serialized states" but we are > dealing with "short lived" ones. However, in today's architectures and > platforms, almost no organization manages to keep all parts of a system > synchronized when it comes to versioning. Different parts of a system is > upgraded at different rates. And this is essentially the same as "long > lived objects" --- "uh this was serialized using LibA 1.1, LibB 2.3 and > JRE 1.4, and we are now at LibA 4.6, LibB 3.1 and Java 8", do you see the > similarity? If not, then I will not be able to convince you. If you do, > then ask "why did Sun/Oracle state that long-lived objects with Java > Serialization was a bad idea?", or were they also clueless on how to do it > right, which seems to be your actual argument. My actual argument is that “data” is “data”. It doesn’t matter how it’s “structured”. The only thing that JSON or XML has on “binary” is that you can “look” at it with your eyes and feel more comfortable with what you see. If I typed the following two sets of byte sequences at you, what could you tell me about them? 00 01 00 00 00 06 01 03 00 00 00 02 00 01 00 00 00 06 01 04 42 28 00 00 In the right context, you could tell me that this is a ModbusTCP request for two holding registers, 40001 and 40002. Further, you’d look at the reply packet and say, it looks like the returned two registers are a floating point number because the first byte is 42. Further, you could tell me that the float point number itself is actually the value 42.0. My point is that there is always context (a ModbusTCP conversation log), knowledge (I know Modbus like the back of my hand) and experience (I know what the general structure of IEEE floating point is and because I have stared at these byte streams when I knew there were float point numbers involved, I can recognize this). Would I be faster to know what I was looking at, if I saw { “downhole_temp” : 42 } instead, sure. But, that “costs” bandwidth across my cellular modem link, and will further decrease the total number of requests I can send across that fixed bandwidth, if I just sent JSON instead of binary data. My point is that it’s just data, but it satisfies another need I have, reducing bandwidth between the source of the data and the user of the data improves system performance. Additionally, it is not “free” to marshal and unmarshal JSON or XML for use by an application. I use large (100k or more) XML documents to “describe” the details of devices that use Modbus communications. I do that because I can then use XSLT to transform them into HTML documents for human consumption to review these technical descriptions visually where it is easier to depict the details. Thus 42280000 becomes something like { “downhole_temp” : 42 } to ease consumption for those who don’t have the training, experience and knowledge I have. Java Serialization has adequate control points to manage evolution of the data in ways that are “evolution”. You do have to understand precisely what the effect of you changes to the “data” in your object and how code referencing that “data” either directly or “functionally” can cope with what is going on. It is important detail. It does require training and experience. You do have to understand some basic patterns for data evolution which will allow you to be successful rather than frustrated by your inexperience or lack of knowledge leading to failure. A large majority of the web development and evolution happened because inexperienced people were left in charge of the new platform. All the other software developers were already working on other things where they had experience and knowledge that was required. The explosion of developer needs starting in 1995 for “HTML” developers was not a big deal. But when Java applets appeared, suddenly you had to know a lot more than how to structure an HTML document. Java’s Exception design caused people who had never had to deal with runtime problems like connection loss or mal-formatted data to be frustrated that there applets kept dying because they didn’t know that you needed to use while( !done ) { try { …. } catch( RuntimeException ex ) { ex.printStackTrace(); } } as the basis for any path of execution that interacted with externally provided data where strings might be not formatted correctly. There are just so many intricate details to knowing how to be successful at software that interacts with “the world.” I know the above can sound pretty arrogant and/or condescending in tone. That’s not my intent. I am trying to describe my view of the complexities yet also demonstrate the common ground that I view “data” is “data” by. The structure of the data requires “knowledge” by us and by the software involved. The think that make JSON or XML convenient are the libraries that provide marshal and unmarshal activities. If those didn’t exist, you would have to do an awful lot of development to create the lexer and parser to unpack the data. How many times have you had to write a “string” quoter that would escape various characters so that you could write out strings as “{ \”downhole_temp\” : “+valueFor(downholeTemp)+” }” for JSON marshaling? You more often then not use the building in marshaling of the native objects with JObject obj = new JObject(); obj.add( “downhole_temp”, downholeTemp); … writeObjectReply( obj.toString() ); instead right? Java serialization has exactly the same facilities for managing the transition from native objects to a “transportable” format and back. That’s all that is required. Are there other details to manage with Java serialization, besides “format”, yes, and that’s the commonality that it shares with JSON, XML or even Modbus as a protocol. You have to know what is on the other end. You have to plan for how to evolve what you are sending against anything that you might want to send. Modbus provides evolution through the 7th byte (the RTU # or device address) in the above packet example. If the value of that byte is >247, then it’s no longer standard modbus, but rather something else. There are many examples of other companies evolving Modbus for their devices because of limitations (many) of the original protocol. Modbus was designed a long time ago, but even then, they understood that there were limits of what was possible with their design and provided an “out”. Further, they took care of the “length” issue by including an inter character timeout of 4 1/2 character times on a serial port for ModbusRTU. The above Modbus TCP packet includes the length of the data in the third word (bytes 5 and 6) of the packet so that after looking at that 7th byte and seeing that its not your address, you can ignore the packet. Thus, data evolution and machine to machine communication evolution is not “new” stuff. It’s been a natural part of system design for a long time. We have the opportunity with Jini to allow for “varied” communication protocols because it’s software. Can we create something that abstracts communication from the application? It’s already there in the endpoint design. That mechanism is how we can “vary” the transport of the data. Can we also abstract the data format? Can we have JSON from a web service end up as Java objects? That’s already there with the smart proxy mechanism. Smart proxies can be manufactured by “proxy” services which just export an object as a service which knows how to use the web service and get JSON objects and make them into Java objects exposed through that smart proxy’s service interface design. I am really interested in this conversation and all the experiences people have with not being successful with Jini. I want to try and expand the problems into either missing knowledge of how you might use Jini/Java differently or into real problems that we need to solve with architecture evolution. All of the things that I’ve tied to push into the Jini community as solutions have been meant to start conversations and show where I’ve encountered friction which I felt software changes to the platform could alleviate better than an external work around/solution. Gregg > > And I think (purely speculative) that many people saw exactly this problem > quite early on, whereas myself I was at the time mostly in relatively small > confined and controlled environments, where up-to-date was managed. And > took me much longer to realize the downsides that are inherent. > > Cheers > Niclas > > On Sat, Feb 4, 2017 at 3:35 PM, Gregg Wonderly <ge...@cox.net> wrote: > >> >>> On Feb 3, 2017, at 8:43 PM, Niclas Hedhman <nic...@hedhman.org> wrote: >>> >>> On Fri, Feb 3, 2017 at 12:23 PM, Peter <j...@zeus.net.au> wrote: >>> >>>> >>>> No serialization or Remote method invocation framework currently >> supports >>>> OSGi very well, one that works well and can provide security might gain >> a >>>> lot of new interest from that user base. >>> >>> >>> What do you mean by this? Jackson's ObjectMapper doesn't have problems on >>> OSGi. You are formulating the problem wrongly, and if formulated >> correctly, >>> perhaps one realizes why Java Serialization fell out of fashion rather >>> quickly 10-12 years ago, when people realized that code mobility (as done >>> in Java serialization/RMI) caused a lot of problems. >> >> I’ve seen and heard of many poorly designed pieces of software. But, the >> serialization for Java has some very easily managed details which can >> trivially allow you to be 100% successful with the use of Serialization. >> I’ve never encountered problems with serialization. I learned early on >> about using explicit versioning for any serialization format, and then >> providing evolution based changes instead of replacement based changes. It >> takes some experience and thought for sure. But, in the end, it’s really >> no different from using JSON, XML or anything else. The format of what you >> send has to be able to change, the content which must remain in a >> compatible way has to remain accessible in the same way. I really am >> saddened by the thought that so many people never learn about binary >> structured data in their classes or through materials they might read to >> learn about such things. >> >> What generally happens is that people forget to design extensibility into >> their data systems, and then end up with all kinds of problems. Here’s >> some of the rules I always try to follow. >> >> 1. Remote interfaces should almost always pass non native type objects >> that wrap the data needed. This will make sure you can seamlessly add more >> data without changing method signatures. >> 2. Always put a serial version id on your serialized classes. Start with >> 1, and increment it as you make changes by more than just ‘1’. >> 3. When you are going to add a new value, think about how you can make >> that independent of existing serialized data. For example, when you >> override readObject or writeObject methods, how will you make sure that >> those methods can cast the data for “this” version of the data without >> breaking past or future versions of the object. >> 4. Data values inside of serialized classes should be carefully designed >> so that there is a “not present” value that is in line with a “not >> initialized” value so that you can always insert a new format in between >> those two (see rule 2 above about leaving holes in the versions). >> >> The purpose of serializing objects is so that you can also send the >> correct code. If you can’t send the correct code (you are just sending >> JSON), and instead have to figure out how to make your new data compatible >> with code that can’t change, how is that any less complex than designing >> readObject and writeObject implementations that must do the same thing when >> you load an old serialization of an object into a new version of the >> object? In this case, readObject() needs to be able to inspect the new >> values that the new code uses in readObject and provide initial values for >> them just like the constructor(s) would do if the object was created new. >> >> I really have never found anything that shipping JSON around makes any >> simpler. You still have to have a parsable JSON string value. You still >> have to migrate data formats when their is an old object receive by new >> code. >> >> The biggest problem of old was people not using an explicit serial version >> id. Several times, I have had to add an explicit serial version id to old >> code so that it would deserialize correctly into new classes. Sometimes it >> is hard to do that. But, that’s not a problem with the system as much as >> it is a lack of understanding or actual neglect in following the design >> standards of the serialization process. >> >> Gregg >> >>> >>> IMHO, RMI/Serialization's design is flawed. Mixing too many concerns in >> the >>> same abstraction; sandboxing w/ integration , code mobility, class >>> resolution, versioning and deserialization, with very little hooks to >>> cusomize any or all of these aspects. And these aspects should not have >>> been wrapped into one monolith. >>> >>> Further, I think the only "sane" approach in a OSGi environment is to >>> create a new bundle for the Remote environment, all codebases not part of >>> the API goes into that bundle and that the API is required to be present >> in >>> the OSGi environment a priori. I.e. treat the Remote objects in OSGi as >> it >>> is treated in plain Java; one classloader, one chunk, sort out its own >>> serialization woes. Likewise for the server; treat it as ordinary RMI, >>> without any mumbo-jambo OSGi stuff to be figured out at a >> non-OSGi-running >>> JVM. An important difference is that in OSGi, the BundleClassLoader is >> not >>> (required to be) a URLClassLoader, so the Java serialization's auto >>> annotation of globally reachable URLs won't work, and one need to rely on >>> java.rmi.server.codebase property, but a bundle could watch for loaded >>> bundles and build that up for URLs that can be resolved globally. >>> >>> >>> Cheers >>> -- >>> Niclas Hedhman, Software Developer >>> http://polygene.apache.org <http://zest.apache.org> - New Energy for >> Java >> >> > > > -- > Niclas Hedhman, Software Developer > http://polygene.apache.org <http://zest.apache.org> - New Energy for Java