Arvid Heise created FLINK-23617:
-----------------------------------
Summary: Co-locate sink operators with same parallelism
Key: FLINK-23617
URL: https://issues.apache.org/jira/browse/FLINK-23617
Project: Flink
Issue Type: Improvement
Components: API / DataStream
Affects Versions: 1.14.0
Reporter: Arvid Heise
FLINK-19531 introduced the implementation of the Sink interface. It strictly
cut the different parts of the sink pipeline into 3 operators:
writer -> committer -> global committer
In streaming mode with a parallelism p, the pipeline is executed as follows
writer(parallelism=p) -> committer(parallelism=p) -> global
committer(parallelism=1).
Here we could bundle writer+committer into one operator.
In batch mode with a parallelism p, the pipeline is executed as follows
writer(parallelism=p) -> committer(parallelism=1) -> global
committer(parallelism=1).
Here we could bundle committer+global committer into one operator. (Committer
needs to run with parallelism=1 to create a pipeline region and reduce the risk
of dataloss during commit; we can hopefully fix it after FLIP-147)
Having fewer operators will decrease the need to copy the committables in the
operator chain (where we currently mostly use Kryo). Thus, we can implement
connection/transaction pooling in streaming, where committables are reused
after successful commit.
The proposal of this ticket is to extract the functionality of the 7 different
operator implementations with their factories into reusable building blocks and
use them in 2 operators (writer+committer and committer+global committer).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)