[
https://issues.apache.org/jira/browse/FLINK-28024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556792#comment-17556792
]
Matthias Pohl commented on FLINK-28024:
---------------------------------------
[~roman] may you have a look at this? It looks like the thread in
[ChannelstateWriteRequestExecutorImpl|https://github.com/apache/flink/blob/d1997b827a0e21308c57450dd7a6df1e8efa5bce/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/channel/ChannelStateWriteRequestExecutorImpl.java#L166]
is not able to be stopped.
> Azure worker stops operating due to log files becoming too big
> --------------------------------------------------------------
>
> Key: FLINK-28024
> URL: https://issues.apache.org/jira/browse/FLINK-28024
> Project: Flink
> Issue Type: Bug
> Components: Build System / Azure Pipelines
> Affects Versions: 1.16.0
> Reporter: Matthias Pohl
> Assignee: Matthias Pohl
> Priority: Blocker
> Labels: test-stability
> Attachments: testWithRocksDbBackendIncremental.log.gz
>
>
> We observed several situations already where log files reached a file size of
> over 120G. This caused the worker's disk usage to reach 100% resulting in the
> worker machine to go "offline", i.e. not being available to pick up new tasks.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)