Jozef Vilcek created BEAM-7206:
----------------------------------
Summary: Coder copy overhead
Key: BEAM-7206
URL: https://issues.apache.org/jira/browse/BEAM-7206
Project: Beam
Issue Type: Improvement
Components: runner-flink, sdk-java-core
Reporter: Jozef Vilcek
More context can be found in discussion here:
[http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwomb2frzute81zsrxnm13t...@mail.gmail.com%3E]
I am not sure how much is this runner dependent, but each operator's user
function receives a copy of data element for isolation. Beam coders does copy
by serializing to bytes and then deserialize back. This seems to impact
performance and grows with job complexity.
On a simple test pipeline described in discussion thread above, I noticed
almost 2x speedup when CoderUtils.copy() just returned the object.
Native Flink job does copy too, but via Kryo, which seems to be doing deep copy
more effectively, on object level.
What can be done in Beam to reduce this overhead?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)