[
https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045373#comment-17045373
]
Stephan Ewen commented on FLINK-15670:
--------------------------------------
I think this looks nice.
One thought would be to combine this into one call, to avoid the need to align
these two function calls, and to be able to directly use the TypeInformation on
the reader side for deserialization, and avoid the need to supply the
environment.
{code}
DataStream<MyType> source = ...;
KeyedStream<MyType> keyedStream = KafkaShuffle.persistentKeyBy(source, topic,
numberOfPartitions, producerProperties, keySelector);
{code}
Maybe "numberOfPartitions" can even be the "maxParallelism" implicitly?
> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's
> KeyGroups
> -------------------------------------------------------------------------------------
>
> Key: FLINK-15670
> URL: https://issues.apache.org/jira/browse/FLINK-15670
> Project: Flink
> Issue Type: New Feature
> Components: API / DataStream, Connectors / Kafka
> Reporter: Stephan Ewen
> Priority: Major
> Labels: usability
> Fix For: 1.11.0
>
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them
> without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job
> into smaller jobs and independent pipelined regions that fail over
> independently.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)