[jira] [Commented] (FLINK-15670) Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups

Stephan Ewen (Jira) Wed, 26 Feb 2020 02:39:46 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045373#comment-17045373
 ]


Stephan Ewen commented on FLINK-15670:
--------------------------------------

I think this looks nice.

One thought would be to combine this into one call, to avoid the need to align 
these two function calls, and to be able to directly use the TypeInformation on 
the reader side for deserialization, and avoid the need to supply the 
environment.

{code}
DataStream<MyType> source = ...;
KeyedStream<MyType> keyedStream = KafkaShuffle.persistentKeyBy(source, topic, 
numberOfPartitions, producerProperties, keySelector);
{code}

Maybe "numberOfPartitions" can even be the "maxParallelism" implicitly?

> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's 
> KeyGroups
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15670
>                 URL: https://issues.apache.org/jira/browse/FLINK-15670
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataStream, Connectors / Kafka
>            Reporter: Stephan Ewen
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them 
> without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job 
> into smaller jobs and independent pipelined regions that fail over 
> independently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-15670) Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups

Reply via email to