[ 
https://issues.apache.org/jira/browse/FLINK-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050031#comment-17050031
 ] 

Yuan Mei edited comment on FLINK-15670 at 3/4/20, 6:37 AM:
-----------------------------------------------------------

[~sewen]

Need to chat a bit for two things:
 # Redefine the scope of the problem, at least for 1.11; 
 # Watermark handling when multiple subtasks writing to the same partition
 ** This is a common problem for intermediate persistency, not just for Kafka
 ** The current mechanism relies on downstream `ExecutionVertex` to progress 
watermark. However, in the case of a sink, there is no such thing as 
`downstream OP`.
 ** I was thinking if there is a coordinator of all subtasks of a 
ExecutionJobVertex then the watermark progress logic can be handled in the 
coordinator 
 ** I find there is an interface `OperatorCoordinator` that may be able to be 
used in this case. But the only two usages of it is under `test`

 

*A bit more details for reference:*

Downstream watermark is handled in

`StreamTaskNetworkInput.processElement` ->

`StatusWatermarkValue.inputWatermark`

In such a case, the watermark in each channel is kept and aligned until 
reaching downstream.

 

Upstream data is buffered through `ChannelSelectorRecordWriter`, which 
maintains 

bufferBuilders for each subpartition (channel).

 

 

 


was (Author: ym):
[~sewen]

Need to chat a bit for two things:
 # Redefine the scope of the problem, at least for 1.11; 
 # Watermark handling when multiple subtasks writing to the same partition
 ** This is a common problem for intermediate persistency, not just for Kafka
 ** The current mechanism relies on downstream `ExecutionVertex` to progress 
watermark. However, in the case of a sink, there is no such thing as 
`downstream OP`.
 ** I was thinking if there is a coordinator of all subtasks of a 
ExecutionJobVertex then the watermark progress logic can be handled in the 
coordinator 
 ** I find there is an interface `OperatorCoordinator` that may be able to be 
used in this case. But the only two usages of it is under `test`

 

*A bit more details for reference:*

Downstream watermark is handled in

`StreamTaskNetworkInput.processElement` ->

`StatusWatermarkValue.inputWatermark`

In such a case, the watermark in each channel is kept and aligned until 
reaching downstream.

 

> Provide a Kafka Source/Sink pair that aligns Kafka's Partitions and Flink's 
> KeyGroups
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-15670
>                 URL: https://issues.apache.org/jira/browse/FLINK-15670
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataStream, Connectors / Kafka
>            Reporter: Stephan Ewen
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> This Source/Sink pair would serve two purposes:
> 1. You can read topics that are already partitioned by key and process them 
> without partitioning them again (avoid shuffles)
> 2. You can use this to shuffle through Kafka, thereby decomposing the job 
> into smaller jobs and independent pipelined regions that fail over 
> independently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to