I think having a fallback to Java serialization is a good thing. I can imagine a user having trouble with Kryo serialization of their operator and unable to figure out then give up totally without us even knowing.
David On Tue, May 17, 2016 at 11:50 AM, Thomas Weise <[email protected]> wrote: > IMO automatically picking a serialializer conflicts with predictable system > behavior. If the serialization does not work I would want to know that > instead of the system doing some trick and arrive at suboptimal or faulty > behavior. > > That does not mean we cannot have optimizations though, as long as there is > explicit user control. > > Thomas > > > On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda <[email protected]> > wrote: > > > As Ram ans Sandesh pointed out, we do have @Bind and @DefaultSerializer > > annotations. However, these are tightly coupled with the field in > question > > and do require modifying external code. Additionally it may also break > > other systems, if we are binding it to a JavaSerializer and perhaps there > > are systems which have other means of serializing the field. > > > > My point was more to do with user having to worry about what serializer > to > > use and how to serialize objects. > > For example, I liked the approach that Storm takes by falling back to > Java > > serialization automatically in case the target class does not have a > > default constructor. > > > > Of course, we can explore type based serialization. But this email was > more > > about the usability aspect; to handle classes not having default > > constructors in general, not just POJO tuples. > > > > ~Bhupesh > > > > > > > > On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni <[email protected] > > > > wrote: > > > > > Can we do a test where we hard code a codec for a POJO and compare > > > performance against kryo. Thereafter we can dynamically compose a > > > codec via pojoutils and inject it. > > > > > > Thanks > > > > > > > On May 17, 2016, at 8:16 AM, Vlad Rozov <[email protected]> > > wrote: > > > > > > > > +1 for type based serialization. Tuples in most cases are flat > > > records/pojo and it should be possible programmatically construct a > codec > > > that will significantly outperform Kryo. It should also reduce amount > of > > > data passed over the wire. I started to look in that direction as well > as > > > Kryo serialization is one of bottlenecks that limits Apex throughput > when > > > operators are deployed into different containers including NODE_LOCAL > > case. > > > > > > > > Thank you, > > > > Vlad > > > > > > > >> On 5/17/16 07:13, Sandesh Hegde wrote: > > > >> If it is possible to serialize, platform should do it automatically, > > it > > > >> reduces the tribal knowledge requirement to use the platform. > Couples > > of > > > >> month back, I also sent out the similar email. > > > >> > > > >> Type based serialization may improve the performance. > > > >> > > > >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath < > [email protected] > > > > > > wrote: > > > >>> > > > >>> Traditionally, we've recommended using > > > >>> "@DefaultSerializer(JavaSerializer.class)" or > > > >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at > > > >>> > > > >>> > > > > > > http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception > > > >>> > > > >>> Can you describe why those approaches are not adequate ? > > > >>> > > > >>> Ram > > > >>> > > > >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda < > > > [email protected]> > > > >>> wrote: > > > >>> > > > >>>> Hi All, > > > >>>> > > > >>>> While working on the integration of Apex with Apache Samoa, I am > > > coming > > > >>>> across some scenarios where I have to add default constructors in > > some > > > >>>> external classes to make them Kryo serializable. Although this > > should > > > be > > > >>>> okay, we would like to avoid modifying external classes as far as > > > >>> possible. > > > >>>> Some other streaming engines have taken different approaches > towards > > > >>>> serialization. > > > >>>> > > > >>>> I looked at Flink and Storm serialization mechanisms. > > > >>>> > > > >>>> Storm has a fall back mechanism on Java serialization. It does use > > > Kryo > > > >>> for > > > >>>> serialization due to performance. But, if the class is not > > > serializable > > > >>>> using Kryo, then it will try to serialize it using Java > > > serialization. If > > > >>>> even then it cannot serialize, then it throws an error. [1] > > > >>>> > > > >>>> Flink has its own serialization stack where it uses a serializer > > > based on > > > >>>> the type information known about the data. [2] > > > >>>> > > > >>>> What does the community think about the current state of > > > serialization in > > > >>>> Apex. Is there a need to explore some approaches which could avoid > > > >>>> serialization issues such as the one described above? Are there > any > > > other > > > >>>> approaches one could use? > > > >>>> > > > >>>> 1. > > > >>> > > > > > > http://storm.apache.org/releases/current/Serialization.html#java-serialization > > > >>>> 2. > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization > > > >>>> > > > >>>> ~Bhupesh > > > > > > > > > >
