[
https://issues.apache.org/jira/browse/FLINK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219560#comment-17219560
]
Xintong Song commented on FLINK-19068:
--------------------------------------
Taking a closer look at this issue, we find it not trivial to filter out events
in {{KubernetesResourceManagerDriver.PodCallbackHandlerImpl}}. It requires
properly maintaining status for non-terminated pods, which is against the idea
behind FLINK-18620 that all the worker status are maintained in
{{ActiveResourceManager}}.
An alternative approach is to adjust the logs. We should print info logs only
for the non-duplicated events, and print the duplicated events at debug level.
I'm opening a PR with the alternative approach.
> Filter verbose pod events for KubernetesResourceManagerDriver
> -------------------------------------------------------------
>
> Key: FLINK-19068
> URL: https://issues.apache.org/jira/browse/FLINK-19068
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Reporter: Xintong Song
> Priority: Major
>
> A status of a Kubernetes pod consists of many detailed fields. Currently,
> Flink receives pod {{MODIFIED}} events from theĀ {{KubernetesPodsWatcher}} on
> every single change to these fields, many of which Flink does not care.
> The verbose events will not affect the functionality of Flink, but will
> pollute the logs with repeated messages, because Flink only looks into the
> fields it interested in and those fields are identical.
> E.g., when a task manager is stopped due to idle timeout, Flink receives 3
> events:
> * MODIFIED: container terminated
> * MODIFIED: {{deletionGracePeriodSeconds}} changes from 30 to 0, which is a
> Kubernetes internal status change after containers are gracefully terminated
> * DELETED: Flink removes metadata of the terminated pod
> Among the 3 messages, Flink is only interested in the 1st MODIFIED message,
> but will try to process all of them because the container status is
> terminated.
> I propose to Filter the verbose events in
> {{KubernetesResourceManagerDriver.PodCallbackHandlerImpl}}, to only process
> the status changes interested by Flink. This probably requires recording the
> status of all living pods, to compare with the incoming events for detecting
> status changes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)