[ 
https://issues.apache.org/jira/browse/FLINK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371706#comment-14371706
 ] 

Stephan Ewen commented on FLINK-1764:
-------------------------------------

I think that very much depends on the contract that you define:

  - After a source function gave a record to the collector, should it be 
guaranteed to still be the same? If you do not promise that, you need not copy.
  - Do you want to guarantee that the value emitted by a map function is never 
changed? That is only ever a problem anyways if the MapFunction retains a 
reference to that value (by storing it in a list or so).

I am unsure whether always copying is a good way to go. The initial use cases 
here use all very small records (often with immutable types anyways) where 
copying comes cheap. As soon as someone uses heavier objects, this
can be pretty heavy on the performance.

I am curious whether we can avoid that by making the copies optional. It can be 
either on or off by default.

> Rework record copying logic in streaming API
> --------------------------------------------
>
>                 Key: FLINK-1764
>                 URL: https://issues.apache.org/jira/browse/FLINK-1764
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.9
>            Reporter: Stephan Ewen
>
> The logic for chained tasks in the streaming API does a lot of copying of 
> records. In some cases, a record is copied multiple times before being passed 
> to a function.
> This seems unnecessary, in the general case. In any case, multiple copies 
> seem incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to