[
https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035734#comment-17035734
]
Stephan Ewen commented on FLINK-15670:
--------------------------------------
One more issue is the question on how to Serialize / Deserialize the data in
Kafka and how to handle watermarks
* A good starting point would be {{TypeInformationSerializationSchema}}. You
know the data types from the DataStream and can use that to create this type
info.
* You need to forward watermarks through the Kafka Topic in the end. The
easiest way is probably to just write the StreamElements into the Kafka Topic,
meaning the type that multiplexes events, watermarks, etc.
* On the Kafka Consumer side, you would then use a "punctuated watermark
assigner" that turns the watermark records back into watermarks. Make sure you
configure that on the partition level in the Kafka Consumer.
> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's
> KeyGroups
> -------------------------------------------------------------------------------------
>
> Key: FLINK-15670
> URL: https://issues.apache.org/jira/browse/FLINK-15670
> Project: Flink
> Issue Type: New Feature
> Components: API / DataStream, Connectors / Kafka
> Reporter: Stephan Ewen
> Priority: Major
> Labels: usability
> Fix For: 1.11.0
>
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them
> without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job
> into smaller jobs and independent pipelined regions that fail over
> independently.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)