[
https://issues.apache.org/jira/browse/FLINK-31654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weijie Guo updated FLINK-31654:
-------------------------------
Fix Version/s: 2.0.0
(was: 1.20.0)
> DataStreamUtils.reinterpretAsKeyedStream() should not override the user
> specified chaining strategy.
> ----------------------------------------------------------------------------------------------------
>
> Key: FLINK-31654
> URL: https://issues.apache.org/jira/browse/FLINK-31654
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream
> Affects Versions: 1.17.0, 1.14.6, 1.16.1, 1.15.4
> Reporter: Jiangjie Qin
> Assignee: Jiangjie Qin
> Priority: Major
> Fix For: 2.0.0
>
>
> Currently {{DataStreamUtils.reinterpretAsKeyedStream()}} does not work well
> with batch jobs. Currently the chaining strategy of the StreamOperators
> applied to a KeyedStream is always overridden to HEAD. This is because in
> batch execution mode the records have to be sorted by keys before they are
> fed to the stateful operators. The runtime relies on the shuffle to do the
> sort so a shuffle is needed for the stateful operators.
> However, for {{DataStreamUtils.reinterpretAsKeyedStream()}} this results in
> unexpected behavior. It breaks the operator chain and defeats the purpose of
> reinterpreting the stream instead of calling {{keyBy.}}
> To fix this issue, we need to do the following for reinterpretAsKeyedStream:
> # Add a sort operator instead of relying on the shuffle to do the sort.
> # Stop overriding the chaining strategy specified by the user for the
> operators applied to the result KeyedStream.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)