[
https://issues.apache.org/jira/browse/BEAM-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831547#comment-16831547
]
Jozef Vilcek commented on BEAM-7206:
------------------------------------
In email discussion, there is a mention that portable runner "stage fusion" is
not prone to element copy overhead. I would like to know more and understand
why it is safe to not do it there.
> Coder copy overhead
> -------------------
>
> Key: BEAM-7206
> URL: https://issues.apache.org/jira/browse/BEAM-7206
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink, sdk-java-core
> Reporter: Jozef Vilcek
> Priority: Major
>
> More context can be found in discussion here:
> [http://mail-archives.apache.org/mod_mbox/beam-dev/201904.mbox/%3CCAOUjMkyKV8npYJfS_PF3Gzo=vwomb2frzute81zsrxnm13t...@mail.gmail.com%3E]
> I am not sure how much is this runner dependent, but each operator's user
> function receives a copy of data element for isolation. Beam coders does copy
> by serializing to bytes and then deserialize back. This seems to impact
> performance and grows with job complexity.
> On a simple test pipeline described in discussion thread above, I noticed
> almost 2x speedup when CoderUtils.copy() just returned the object.
> Native Flink job does copy too, but via Kryo, which seems to be doing deep
> copy more effectively, on object level.
> What can be done in Beam to reduce this overhead?
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)