[ https://issues.apache.org/jira/browse/FLINK-12576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935937#comment-16935937 ]
David Anderson edited comment on FLINK-12576 at 9/23/19 2:58 PM: ----------------------------------------------------------------- To reproduce, git clone --branch backpressure-with-2-TMs https://github.com/alpinegizmo/flink-playgrounds.git cd flink-playgrounds/operations-playground docker-compose build docker-compose up -d You will find a job with these 5 operators (1) kafka -> (2) timestamps / watermarks -> (3) keyBy + backpressure map -> (4) keyBy + window -> (5) kafka where #3, the backpressure map, causes severe backpressure every other minute. The job is running with parallelism of 2 throughout; up until the first keyBy all the traffic is on the subtasks with 0 as their index. In this backpressure-with-2-TMs branch there are two TMs each with one slot. You will observe that all of the output metrics for the 0-index watermarking subtask rise to 1 during the even-numbered minutes, and fall to 0 during the odd numbered minutes, as expected. If I run this with one TM with 2 slots, all of the input metrics for the backpressure operator are always zero. In this backpressure-with-2-TMs branch there are 2 single-slot TMs, and here the input metrics for subtask 0 of the backpressure operator are always 0, but the input metrics for subtask 1 of that operator rise and fall every minute, as they should. Since subtask 0 is handling 2x as many records as subtask 1, I conclude that the local input metrics are still broken. was (Author: alpinegizmo): To reproduce, git clone --branch backpressure-with-2-TMs https://github.com/alpinegizmo/flink-playgrounds.git cd flink-playgrounds/operations-playground docker-compose build docker-compose up -d You will find a job with these 5 operators (1) kafka -> (2) timestamps / watermarks -> (3) keyBy + backpressure map -> (4) keyBy + window -> (5) kafka where #3, the backpressure map, causes severe backpressure every other minute. The job is running with parallelism of 2 throughout; up until the first keyBy all the traffic is on the subtasks with 0 as their index. In this backpressure-with-2-TMs branch there are two TMs each with one slot. You will observe that all of the output metrics for the 0-index watermarking subtask rise to 1 during the even-numbered minutes, and fall to 0 during the odd numbered minutes, as expected. If I run this with one TM with 2 slots, all of the input metrics for the backpressure operator are always zero. In this backpressure-with-2-TMs branch there are 2 single-slot TMs, and here the input metrics for subtask 0 of the backpressure operator are always 0, but the input metrics for subtask 1 of that operator rise and fall every minute, as they should. Thus my conclusion that the local input metrics are still broken. > inputQueueLength metric does not work for LocalInputChannels > ------------------------------------------------------------ > > Key: FLINK-12576 > URL: https://issues.apache.org/jira/browse/FLINK-12576 > Project: Flink > Issue Type: Bug > Components: Runtime / Metrics, Runtime / Network > Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Reporter: Piotr Nowojski > Assignee: Aitozi > Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently {{inputQueueLength}} ignores LocalInputChannels > ({{SingleInputGate#getNumberOfQueuedBuffers}}). This can can cause mistakes > when looking for causes of back pressure (If task is back pressuring whole > Flink job, but there is a data skew and only local input channels are being > used). -- This message was sent by Atlassian Jira (v8.3.4#803005)