[ 
https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035734#comment-17035734
 ] 

Stephan Ewen commented on FLINK-15670:
--------------------------------------

One more issue is the question on how to Serialize / Deserialize the data in 
Kafka and how to handle watermarks

* A good starting point would be {{TypeInformationSerializationSchema}}. You 
know the data types from the DataStream and can use that to create this type 
info.
* You need to forward watermarks through the Kafka Topic in the end. The 
easiest way is probably to just write the StreamElements into the Kafka Topic, 
meaning the type that multiplexes events, watermarks, etc.
* On the Kafka Consumer side, you would then use a "punctuated watermark 
assigner" that turns the watermark records back into watermarks. Make sure you 
configure that on the partition level in the Kafka Consumer.

> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's 
> KeyGroups
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15670
>                 URL: https://issues.apache.org/jira/browse/FLINK-15670
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataStream, Connectors / Kafka
>            Reporter: Stephan Ewen
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them 
> without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job 
> into smaller jobs and independent pipelined regions that fail over 
> independently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to