+1 for type based serialization. Tuples in most cases are flat records/pojo and it should be possible programmatically construct a codec that will significantly outperform Kryo. It should also reduce amount of data passed over the wire. I started to look in that direction as well as Kryo serialization is one of bottlenecks that limits Apex throughput when operators are deployed into different containers including NODE_LOCAL case.

Thank you,
Vlad

On 5/17/16 07:13, Sandesh Hegde wrote:
If it is possible to serialize, platform should do it automatically, it
reduces the tribal knowledge requirement to use the platform. Couples of
month back, I also sent out the similar email.

Type based serialization may improve the performance.

On Tue, May 17, 2016, 6:06 AM Munagala Ramanath <[email protected]> wrote:

Traditionally, we've recommended using
"@DefaultSerializer(JavaSerializer.class)" or
"@FieldSerializer.Bind(CustomSerializer.class)" as outlined at

http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception

Can you describe why those approaches are not adequate ?

Ram

On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda <[email protected]>
wrote:

Hi All,

While working on the integration of Apex with Apache Samoa, I am coming
across some scenarios where I have to add default constructors in some
external classes to make them Kryo serializable. Although this should be
okay, we would like to avoid modifying external classes as far as
possible.
Some other streaming engines have taken different approaches towards
serialization.

I looked at Flink and Storm serialization mechanisms.

Storm has a fall back mechanism on Java serialization. It does use Kryo
for
serialization due to performance. But, if the class is not serializable
using Kryo, then it will try to serialize it using Java serialization. If
even then it cannot serialize, then it throws an error. [1]

Flink has its own serialization stack where it uses a serializer based on
the type information known about the data. [2]

What does the community think about the current state of serialization in
Apex. Is there a need to explore some approaches which could avoid
serialization issues such as the one described above? Are there any other
approaches one could use?

1.


http://storm.apache.org/releases/current/Serialization.html#java-serialization
2.


https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization

~Bhupesh


Reply via email to