Can we do a test where we hard code a codec for a POJO and compare performance against kryo. Thereafter we can dynamically compose a codec via pojoutils and inject it.
Thanks > On May 17, 2016, at 8:16 AM, Vlad Rozov <[email protected]> wrote: > > +1 for type based serialization. Tuples in most cases are flat records/pojo > and it should be possible programmatically construct a codec that will > significantly outperform Kryo. It should also reduce amount of data passed > over the wire. I started to look in that direction as well as Kryo > serialization is one of bottlenecks that limits Apex throughput when > operators are deployed into different containers including NODE_LOCAL case. > > Thank you, > Vlad > >> On 5/17/16 07:13, Sandesh Hegde wrote: >> If it is possible to serialize, platform should do it automatically, it >> reduces the tribal knowledge requirement to use the platform. Couples of >> month back, I also sent out the similar email. >> >> Type based serialization may improve the performance. >> >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <[email protected]> >>> wrote: >>> >>> Traditionally, we've recommended using >>> "@DefaultSerializer(JavaSerializer.class)" or >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at >>> >>> http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception >>> >>> Can you describe why those approaches are not adequate ? >>> >>> Ram >>> >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <[email protected]> >>> wrote: >>> >>>> Hi All, >>>> >>>> While working on the integration of Apex with Apache Samoa, I am coming >>>> across some scenarios where I have to add default constructors in some >>>> external classes to make them Kryo serializable. Although this should be >>>> okay, we would like to avoid modifying external classes as far as >>> possible. >>>> Some other streaming engines have taken different approaches towards >>>> serialization. >>>> >>>> I looked at Flink and Storm serialization mechanisms. >>>> >>>> Storm has a fall back mechanism on Java serialization. It does use Kryo >>> for >>>> serialization due to performance. But, if the class is not serializable >>>> using Kryo, then it will try to serialize it using Java serialization. If >>>> even then it cannot serialize, then it throws an error. [1] >>>> >>>> Flink has its own serialization stack where it uses a serializer based on >>>> the type information known about the data. [2] >>>> >>>> What does the community think about the current state of serialization in >>>> Apex. Is there a need to explore some approaches which could avoid >>>> serialization issues such as the one described above? Are there any other >>>> approaches one could use? >>>> >>>> 1. >>> http://storm.apache.org/releases/current/Serialization.html#java-serialization >>>> 2. >>> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization >>>> >>>> ~Bhupesh >
