[
https://issues.apache.org/jira/browse/FLINK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371706#comment-14371706
]
Stephan Ewen commented on FLINK-1764:
-------------------------------------
I think that very much depends on the contract that you define:
- After a source function gave a record to the collector, should it be
guaranteed to still be the same? If you do not promise that, you need not copy.
- Do you want to guarantee that the value emitted by a map function is never
changed? That is only ever a problem anyways if the MapFunction retains a
reference to that value (by storing it in a list or so).
I am unsure whether always copying is a good way to go. The initial use cases
here use all very small records (often with immutable types anyways) where
copying comes cheap. As soon as someone uses heavier objects, this
can be pretty heavy on the performance.
I am curious whether we can avoid that by making the copies optional. It can be
either on or off by default.
> Rework record copying logic in streaming API
> --------------------------------------------
>
> Key: FLINK-1764
> URL: https://issues.apache.org/jira/browse/FLINK-1764
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 0.9
> Reporter: Stephan Ewen
>
> The logic for chained tasks in the streaming API does a lot of copying of
> records. In some cases, a record is copied multiple times before being passed
> to a function.
> This seems unnecessary, in the general case. In any case, multiple copies
> seem incorrect.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)