kennknowles opened a new issue, #19483:
URL: https://github.com/apache/beam/issues/19483

   More context can be found in discussion here:
   
   
[http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwomb2frzute81zsrxnm13t...@mail.gmail.com%3E](http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwomb2frzute81zsrxnm13t...@mail.gmail.com%3E)
   
   I am not sure how much is this runner dependent, but each operator's user 
function receives a copy of data element for isolation. Beam coders does copy 
by serializing to bytes and then deserialize back. This seems to impact 
performance and grows with job complexity.
   
   On a simple test pipeline described in discussion thread above, I noticed 
almost 2x speedup when CoderUtils.copy() just returned the object. 
   
   Native Flink job does copy too, but via Kryo, which seems to be doing deep 
copy more effectively, on object level.
   
   What can be done in Beam to reduce this overhead?
   
    
   
   Imported from Jira 
[BEAM-7206](https://issues.apache.org/jira/browse/BEAM-7206). Original Jira may 
contain additional context.
   Reported by: JozoVilcek.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to