[jira] [Commented] (FLINK-28024) Azure worker stops operating due to log files becoming too big

Matthias Pohl (Jira) Tue, 21 Jun 2022 03:16:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556792#comment-17556792
 ]


Matthias Pohl commented on FLINK-28024:
---------------------------------------

[~roman] may you have a look at this? It looks like the thread in 
[ChannelstateWriteRequestExecutorImpl|https://github.com/apache/flink/blob/d1997b827a0e21308c57450dd7a6df1e8efa5bce/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequestExecutorImpl.java#L166]
 is not able to be stopped.

> Azure worker stops operating due to log files becoming too big
> --------------------------------------------------------------
>
>                 Key: FLINK-28024
>                 URL: https://issues.apache.org/jira/browse/FLINK-28024
>             Project: Flink
>          Issue Type: Bug
>          Components: Build System / Azure Pipelines
>    Affects Versions: 1.16.0
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Blocker
>              Labels: test-stability
>         Attachments: testWithRocksDbBackendIncremental.log.gz
>
>
> We observed several situations already where log files reached a file size of 
> over 120G. This caused the worker's disk usage to reach 100% resulting in the 
> worker machine to go "offline", i.e. not being available to pick up new tasks.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (FLINK-28024) Azure worker stops operating due to log files becoming too big

Reply via email to