[jira] [Commented] (FLINK-19068) Filter verbose pod events for KubernetesResourceManagerDriver

Xintong Song (Jira) Fri, 23 Oct 2020 02:08:49 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219560#comment-17219560
 ]


Xintong Song commented on FLINK-19068:
--------------------------------------

Taking a closer look at this issue, we find it not trivial to filter out events 
in {{KubernetesResourceManagerDriver.PodCallbackHandlerImpl}}. It requires 
properly maintaining status for non-terminated pods, which is against the idea 
behind FLINK-18620 that all the worker status are maintained in 
{{ActiveResourceManager}}.

An alternative approach is to adjust the logs. We should print info logs only 
for the non-duplicated events, and print the duplicated events at debug level.

I'm opening a PR with the alternative approach.

> Filter verbose pod events for KubernetesResourceManagerDriver
> -------------------------------------------------------------
>
>                 Key: FLINK-19068
>                 URL: https://issues.apache.org/jira/browse/FLINK-19068
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>            Reporter: Xintong Song
>            Priority: Major
>
> A status of a Kubernetes pod consists of many detailed fields. Currently, 
> Flink receives pod {{MODIFIED}} events from the {{KubernetesPodsWatcher}} on 
> every single change to these fields, many of which Flink does not care.
> The verbose events will not affect the functionality of Flink, but will 
> pollute the logs with repeated messages, because Flink only looks into the 
> fields it interested in and those fields are identical.
> E.g., when a task manager is stopped due to idle timeout, Flink receives 3 
> events:
> * MODIFIED: container terminated
> * MODIFIED: {{deletionGracePeriodSeconds}} changes from 30 to 0, which is a 
> Kubernetes internal status change after containers are gracefully terminated
> * DELETED: Flink removes metadata of the terminated pod
> Among the 3 messages, Flink is only interested in the 1st MODIFIED message, 
> but will try to process all of them because the container status is 
> terminated.
> I propose to Filter the verbose events in 
> {{KubernetesResourceManagerDriver.PodCallbackHandlerImpl}}, to only process 
> the status changes interested by Flink. This probably requires recording the 
> status of all living pods, to compare with the incoming events for detecting 
> status changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19068) Filter verbose pod events for KubernetesResourceManagerDriver

Reply via email to