[jira] [Comment Edited] (FLINK-15670) Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups

Yuan Mei (Jira) Fri, 15 May 2020 17:31:24 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108783#comment-17108783
 ]


Yuan Mei edited comment on FLINK-15670 at 5/16/20, 12:30 AM:
-------------------------------------------------------------

Things to follow up and discuss (listed here in case my forgetting about them):

 
 # Address cases when # of partitions != # of consumer tasks
 # Batch emitting Kafka fetcher records (similar to FLINK-17307 Add collector 
to deserialize in KafkaDeserializationSchema)
 # Whether to separate sink (producer) and source (consumer) to different jobs. 
 ** Although they are recovered independently according to regional failover, 
however, they share the same checkpoint coordinator, and correspondingly share 
the same global checkpoint snapshot
 ** That says if the consumer fails, the producer can not commit to writing the 
data because of two-phase commit set-up (it needs a checkpoint-complete signal 
to complete the second stage)

 

 


was (Author: ym):
Things to follow up and discuss (listed here in case forgotten):

 
 # Address cases when # of partitions != # of consumer tasks
 # Batch emitting Kafka fetcher records (similar to [FLINK-17307] Add collector 
to deserialize in KafkaDeserializationSchema)
 # Whether to separate sink (producer) and source (consumer) to different jobs. 
 ** Although they are recovered independently according to regional failover, 
however, they share the same checkpoint coordinator, and correspondingly share 
the same global checkpoint snapshot
 ** That says if the consumer fails, the producer can not commit to writing the 
data because of two-phase commit set-up (it needs a checkpoint-complete signal 
to complete the second stage)

 

 

> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's 
> KeyGroups
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15670
>                 URL: https://issues.apache.org/jira/browse/FLINK-15670
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataStream, Connectors / Kafka
>            Reporter: Stephan Ewen
>            Assignee: Yuan Mei
>            Priority: Major
>              Labels: pull-request-available, usability
>             Fix For: 1.11.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them 
> without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job 
> into smaller jobs and independent pipelined regions that fail over 
> independently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-15670) Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's KeyGroups

Reply via email to