[
https://issues.apache.org/jira/browse/FLINK-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733825#comment-14733825
]
ASF GitHub Bot commented on FLINK-2631:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/1101
[FLINK-2631] [streaming] Fixes the StreamFold operator and adds output type
configurable stream operators
Adds support for non-serializable initial fold values by storing the value
in a byte array before shipping it. The shipped initial fold value is
deserialized on the TM while calling the `open` method.
Furthermore, this PR introduces the `OutputTypeConfigurable` interface
which allows stream operators to get to know their output type. The
`OutputTypeConfigurable` interface offers the method `setOutputType` which is
called by the `StreamGraph` when the `StreamOperator` is added in the
`addOperator` method. At the latest at this moment, the concrete output type,
whether inferred from the UDF or set manually with `returns`, should be know to
the system, because also the input and output type serializers for the vertex
are created in the `addOperator` method. All stream operators which need to
know their output type should implement the `OutputTypeConfigurable` interface.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink fixStreamingFold
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1101.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1101
----
commit 63951adca0e8bfefd1d81b933017e9fadc5f556f
Author: Till Rohrmann <[email protected]>
Date: 2015-09-07T09:34:48Z
[FLINK-2631] [streaming] Fixes the StreamFold operator. Adds
OutputTypeConfigurable interface to support type injection at StreamGraph
creation.
Adds test for non serializable fold type. Adds test to verify proper output
type forwarding for OutputTypeConfigurable implementations.
Adds comments
----
> StreamFold operator does not respect returns type and stores non serializable
> values
> ------------------------------------------------------------------------------------
>
> Key: FLINK-2631
> URL: https://issues.apache.org/jira/browse/FLINK-2631
> Project: Flink
> Issue Type: Bug
> Reporter: Till Rohrmann
>
> The {{StreamFold}} operator stores the initial value of the fold operation
> for the task deployment. This value does not necessarily have to be
> serializable. Thus, using the fold operation with a non-serializable initial
> value will fail the job.
> Moreover, the {{StreamFold}} operator needs to know the output type in order
> to create a {{TypeSerializer}}. For {{StreamGraphs}} where the output type is
> not know when the operator is created, as it is the case for the Scala
> DataStream API which directly sets the output type after creating the
> operator via the {{returns}} method, this approach will fail. The reason is
> that the {{StreamFold}} operator does receive the type information set by the
> {{returns}} method. Therefore, the job will fail at runtime because the
> operator tries to create a serializer from a {{MissingTypeInformation}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)