[ https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108783#comment-17108783 ]
Yuan Mei edited comment on FLINK-15670 at 5/16/20, 12:30 AM: ------------------------------------------------------------- Things to follow up and discuss (listed here in case my forgetting about them): # Address cases when # of partitions != # of consumer tasks # Batch emitting Kafka fetcher records (similar to FLINK-17307 Add collector to deserialize in KafkaDeserializationSchema) # Whether to separate sink (producer) and source (consumer) to different jobs. ** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot ** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage) was (Author: ym): Things to follow up and discuss (listed here in case forgotten): # Address cases when # of partitions != # of consumer tasks # Batch emitting Kafka fetcher records (similar to [FLINK-17307] Add collector to deserialize in KafkaDeserializationSchema) # Whether to separate sink (producer) and source (consumer) to different jobs. ** Although they are recovered independently according to regional failover, however, they share the same checkpoint coordinator, and correspondingly share the same global checkpoint snapshot ** That says if the consumer fails, the producer can not commit to writing the data because of two-phase commit set-up (it needs a checkpoint-complete signal to complete the second stage) > Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's > KeyGroups > ------------------------------------------------------------------------------------- > > Key: FLINK-15670 > URL: https://issues.apache.org/jira/browse/FLINK-15670 > Project: Flink > Issue Type: New Feature > Components: API / DataStream, Connectors / Kafka > Reporter: Stephan Ewen > Assignee: Yuan Mei > Priority: Major > Labels: pull-request-available, usability > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This Source/Sink pair would serve two purposes: > 1. You can read topics that are already partitioned by key and process them > without partitioning them again (avoid shuffles) > 2. You can use this to shuffle through Kafka, thereby decomposing the job > into smaller jobs and independent pipelined regions that fail over > independently. -- This message was sent by Atlassian Jira (v8.3.4#803005)