[ 
https://issues.apache.org/jira/browse/BEAM-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622842#comment-15622842
 ] 

Amit Sela commented on BEAM-626:
--------------------------------

Those objects rely on Java serialization which is outperformed by Kryo 
serialization.
This may not be an issue for other frameworks as they serialize tasks (The DoFn 
and it's content) to the worker because it happens every time a new worker is 
deployed/utilized but Spark streaming has to do this every batch interval 
(sometimes as frequent as 500 msec) so it is better to use Kryo. 

So far {{AvroCoder}} is the only issue, and the entire Spark community (which 
is one of the largest Apache communities) deals with Kryo just fine.

As discussed before, if this becomes a pain I could expose an API to register 
serializers (falling back to {{JavaSerialization}} is one of them).

What I really don't understand is the stand on this ticket - "AvroCoder not 
deserializing correctly in Kryo" - as in making the coder work with Kryo..
If the proposed solution was to degrade the quality/performance of the current 
state of the implementation I would be the first to suggest we simply register 
with Java but this is not the case. First proposal wasn't threadsafe, which is 
not different from the current state, and adding synchronization might heart 
performance (regardless of Kryo).

>From my point of view, enabling the coder to work with Kryo while NOT making 
>it worse in any way, is a good thing.

> AvroCoder not deserializing correctly in Kryo
> ---------------------------------------------
>
>                 Key: BEAM-626
>                 URL: https://issues.apache.org/jira/browse/BEAM-626
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Aviem Zur
>            Assignee: Aviem Zur
>            Priority: Minor
>
> Unlike with Java serialization, when deserializing AvroCoder using Kryo, the 
> resulting AvroCoder is missing all of its transient fields.
> The reason it works with Java serialization is because of the usage of 
> writeReplace and readResolve, which Kryo does not adhere to.
> In ProtoCoder for example there are also unserializable members, the way it 
> is solved there is lazy initializing these members via their getters, so they 
> are initialized in the deserialized object on first call to the member.
> It seems AvroCoder is the only class in Beam to use writeReplace convention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to