Jozef Vilcek created BEAM-7206:
----------------------------------

             Summary: Coder copy overhead
                 Key: BEAM-7206
                 URL: https://issues.apache.org/jira/browse/BEAM-7206
             Project: Beam
          Issue Type: Improvement
          Components: runner-flink, sdk-java-core
            Reporter: Jozef Vilcek


More context can be found in discussion here:

[http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwomb2frzute81zsrxnm13t...@mail.gmail.com%3E]

I am not sure how much is this runner dependent, but each operator's user 
function receives a copy of data element for isolation. Beam coders does copy 
by serializing to bytes and then deserialize back. This seems to impact 
performance and grows with job complexity.

On a simple test pipeline described in discussion thread above, I noticed 
almost 2x speedup when CoderUtils.copy() just returned the object. 

Native Flink job does copy too, but via Kryo, which seems to be doing deep copy 
more effectively, on object level.

What can be done in Beam to reduce this overhead?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to