[ 
https://issues.apache.org/jira/browse/BEAM-11146?focusedWorklogId=508119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-508119
 ]

ASF GitHub Bot logged work on BEAM-11146:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Nov/20 12:43
            Start Date: 05/Nov/20 12:43
    Worklog Time Spent: 10m 
      Work Description: dmvk commented on pull request #13240:
URL: https://github.com/apache/beam/pull/13240#issuecomment-722354299


   > introduce usingImmutableTypes (or similar), which would have the meaning 
of both objectReuse and fasterCopy
   
   I think you got me wrong, what I'm trying to suggest is that there is 
basically no chance `objectReuse` can affect user code. Only think it affects 
in existing pipelines is runner code, which we have full control of. So it 
makes sense get rid of this flag completely as there is absolutely no benefit 
of exposing it to the user. Hiding it behind another knob is even worse than 
current state as it would be hard to understand what it actually does.
   
   > Is this with and without my change enabled? If yes, would you mind sending 
me an email with the setup and the two pipelines? It would be very useful data 
for my thesis!
   
   It's without your change. Just to note, this patch should have no effect on 
distribution of used Serializers as it only affects internal behavior of 
CoderTypeSerializer.
   
   We can do some basic profiling with & without your patch within next few 
weeks. I'll let you know once we have results ;)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 508119)
    Time Spent: 2.5h  (was: 2h 20m)

> Add option to disable copying between Flink runner 
> ---------------------------------------------------
>
>                 Key: BEAM-11146
>                 URL: https://issues.apache.org/jira/browse/BEAM-11146
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Teodor Spæren
>            Assignee: Teodor Spæren
>            Priority: P2
>              Labels: performance
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In order to implement Flink 
> [TypeSerializer|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/TypeSerializer.java]
>  the runner implements 
> [CoderTypeSerializer|https://github.com/apache/beam/blob/master/runners/flink/1.8/src/main/java/org/apache/beam/runners/flink/translation/types/CoderTypeSerializer.java#L84].
>  The way the {{copy}} function is implemented is by first serializing and 
> then deserializing each element. This means that such a deep copy needs to be 
> done between each operator and this can become a bottleneck.
> The reason the {{copy}} functions need to be implemented is that Flink 
> guarantees that elements will be deep copied between each operator. In Beam 
> this is the users responsibility and so this is not strictly neccecarry.
> The aim of this improvement is to introduce an option on the Flink Runner, 
> that eliminates this overhead, by simply returning the value.
> [Here is the mailing list 
> discussion|https://lists.apache.org/thread.html/r24129dba98782e1cf4d18ec738ab9714dceb05ac23f13adfac5baad1%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to