[jira] [Comment Edited] (FLINK-12576) inputQueueLength metric does not work for LocalInputChannels

David Anderson (Jira) Mon, 23 Sep 2019 07:59:11 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-12576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935937#comment-16935937
 ]


David Anderson edited comment on FLINK-12576 at 9/23/19 2:58 PM:
-----------------------------------------------------------------

To reproduce,


git clone --branch backpressure-with-2-TMs 
https://github.com/alpinegizmo/flink-playgrounds.git
cd flink-playgrounds/operations-playground
docker-compose build
docker-compose up -d

You will find a job with these 5 operators

(1) kafka -> (2) timestamps / watermarks -> (3) keyBy + backpressure map -> (4) 
keyBy + window -> (5) kafka

where #3, the backpressure map, causes severe backpressure every other minute. 
The job is running with parallelism of 2 throughout; up until the first keyBy 
all the traffic is on the subtasks with 0 as their index. 

In this backpressure-with-2-TMs branch there are two TMs each with one slot. 
You will observe that all of the output metrics for the 0-index watermarking 
subtask rise to 1 during the even-numbered minutes, and fall to 0 during the 
odd numbered minutes, as expected. 

If I run this with one TM with 2 slots, all of the input metrics for the 
backpressure operator are always zero. 

In this backpressure-with-2-TMs branch there are 2 single-slot TMs, and here 
the input metrics for subtask 0 of the backpressure operator are always 0, but 
the input metrics for subtask 1 of that operator rise and fall every minute, as 
they should. Since subtask 0 is handling 2x as many records as subtask 1, I 
conclude that the local input metrics are still broken.




was (Author: alpinegizmo):
To reproduce,


git clone --branch backpressure-with-2-TMs 
https://github.com/alpinegizmo/flink-playgrounds.git
cd flink-playgrounds/operations-playground
docker-compose build
docker-compose up -d

You will find a job with these 5 operators

(1) kafka -> (2) timestamps / watermarks -> (3) keyBy + backpressure map -> (4) 
keyBy + window -> (5) kafka

where #3, the backpressure map, causes severe backpressure every other minute. 
The job is running with parallelism of 2 throughout; up until the first keyBy 
all the traffic is on the subtasks with 0 as their index. 

In this backpressure-with-2-TMs branch there are two TMs each with one slot. 
You will observe that all of the output metrics for the 0-index watermarking 
subtask rise to 1 during the even-numbered minutes, and fall to 0 during the 
odd numbered minutes, as expected. 

If I run this with one TM with 2 slots, all of the input metrics for the 
backpressure operator are always zero. 

In this backpressure-with-2-TMs branch there are 2 single-slot TMs, and here 
the input metrics for subtask 0 of the backpressure operator are always 0, but 
the input metrics for subtask 1 of that operator rise and fall every minute, as 
they should. Thus my conclusion that the local input metrics are still broken.



> inputQueueLength metric does not work for LocalInputChannels
> ------------------------------------------------------------
>
>                 Key: FLINK-12576
>                 URL: https://issues.apache.org/jira/browse/FLINK-12576
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Metrics, Runtime / Network
>    Affects Versions: 1.6.4, 1.7.2, 1.8.0
>            Reporter: Piotr Nowojski
>            Assignee: Aitozi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently {{inputQueueLength}} ignores LocalInputChannels 
> ({{SingleInputGate#getNumberOfQueuedBuffers}}). This can can cause mistakes 
> when looking for causes of back pressure (If task is back pressuring whole 
> Flink job, but there is a data skew and only local input channels are being 
> used).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-12576) inputQueueLength metric does not work for LocalInputChannels

Reply via email to