[
https://issues.apache.org/jira/browse/FLINK-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333422#comment-15333422
]
ASF GitHub Bot commented on FLINK-3974:
---------------------------------------
GitHub user aljoscha opened a pull request:
https://github.com/apache/flink/pull/2110
[FLINK-3974] Fix object reuse with multi-chaining
Before, a job would fail if object reuse was enabled and multiple
operators were chained to one upstream operator. Now, we always create a
shallow copy of the StreamRecord in OperatorChain.ChainingOutput because
downstream operations change/reuse the StreamRecord.
This fix was contributed by @wanderingbort (if this is the right github
handle) as a patch on the Flink Jira. I can change the commit to attribute it
to him but so far he didn't respond to my question about this on Jira.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aljoscha/flink chaining/fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2110.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2110
----
commit 092f350cccbda32331f527c4eaf7ad3304fa1811
Author: Aljoscha Krettek <[email protected]>
Date: 2016-06-14T10:18:35Z
[FLINK-3974] Fix object reuse with multi-chaining
Before, a job would fail if object reuse was enabled and multiple
operators were chained to one upstream operator. Now, we always create a
shallow copy of the StreamRecord in OperatorChain.ChainingOutput because
downstream operations change/reuse the StreamRecord.
----
> enableObjectReuse fails when an operator chains to multiple downstream
> operators
> --------------------------------------------------------------------------------
>
> Key: FLINK-3974
> URL: https://issues.apache.org/jira/browse/FLINK-3974
> Project: Flink
> Issue Type: Bug
> Components: DataStream API
> Affects Versions: 1.0.3
> Reporter: B Wyatt
> Attachments: ReproFLINK3974.java, bwyatt-FLINK3974.1.patch
>
>
> Given a topology that looks like this:
> {code:java}
> DataStream<A> input = ...
> input
> .map(MapFunction<A,B>...)
> .addSink(...);
> input
> .map(MapFunction<A,C>...)
> .addSink(...);
> {code}
> enableObjectReuse() will cause an exception in the form of
> {{"java.lang.ClassCastException: B cannot be cast to A"}} to be thrown.
> It looks like the input operator calls {{Output<StreamRecord<A>>.collect}}
> which attempts to loop over the downstream operators and process them.
> However, the first map operation will call {{StreamRecord<>.replace}} which
> mutates the value stored in the StreamRecord<>.
> As a result, when the {{Output<StreamRecord<A>>.collect}} call passes the
> {{StreamRecord<A>}} to the second map operation it is actually a
> {{StreamRecord<B>}} and behaves as if the two map operations were serial
> instead of parallel.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)