[
https://issues.apache.org/jira/browse/FLINK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441952#comment-17441952
]
Roman Khachatryan commented on FLINK-24433:
-------------------------------------------
I suspected that changelog state backend might have left some files undeleted.
But when checking the logs (e2e), I see it wasn't enabled.
However, both failures selected exactly the same values:
{code:java}
find ./ -type f -exec grep 'Randomly ' {} \; | less
PseudoRandomValueSelector [] - Randomly selected false for
execution.checkpointing.unaligned
PseudoRandomValueSelector [] - Randomly selected PT2S for
execution.checkpointing.alignment-timeout
PseudoRandomValueSelector [] - Randomly selected false for
state.backend.changelog.enabled
PseudoRandomValueSelector [] - Randomly selected false for
taskmanager.network.memory.buffer-debloat.enabled
PseudoRandomValueSelector [] - Randomly selected false for
execution.checkpointing.unaligned
PseudoRandomValueSelector [] - Randomly selected PT2S for
execution.checkpointing.alignment-timeout
PseudoRandomValueSelector [] - Randomly selected false for
state.backend.changelog.enabled
PseudoRandomValueSelector [] - Randomly selected false for
taskmanager.network.memory.buffer-debloat.enabled
{code}
This probably suggests that some test creates too much back-pressure and can
hang without Unaligned checkpoints or Buffer debloating.
I didn't check the other phases though.
> "No space left on device" in Azure e2e tests
> --------------------------------------------
>
> Key: FLINK-24433
> URL: https://issues.apache.org/jira/browse/FLINK-24433
> Project: Flink
> Issue Type: Bug
> Components: Build System / Azure Pipelines
> Affects Versions: 1.15.0
> Reporter: Dawid Wysakowicz
> Priority: Major
> Labels: test-stability
> Fix For: 1.15.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=24668&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=070ff179-953e-5bda-71fa-d6599415701c&l=19772
> {code}
> Sep 30 17:08:42 Job has been submitted with JobID
> 5594c18e128a328ede39cfa59cb3cb07
> Sep 30 17:08:56 2021-09-30 17:08:56,809 main ERROR Recovering from
> StringBuilderEncoder.encode('2021-09-30 17:08:56,807 WARN
> org.apache.flink.streaming.api.operators.collect.CollectResultFetcher [] - An
> exception occurred when fetching query results
> Sep 30 17:08:56 java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.rest.util.RestClientException: [Internal server
> error., <Exception on server side:
> Sep 30 17:08:56 org.apache.flink.runtime.messages.FlinkJobNotFoundException:
> Could not find Flink job (5594c18e128a328ede39cfa59cb3cb07)
> Sep 30 17:08:56 at
> org.apache.flink.runtime.dispatcher.Dispatcher.getJobMasterGateway(Dispatcher.java:923)
> Sep 30 17:08:56 at
> org.apache.flink.runtime.dispatcher.Dispatcher.performOperationOnJobMasterGateway(Dispatcher.java:937)
> Sep 30 17:08:56 at
> org.apache.flink.runtime.dispatcher.Dispatcher.deliverCoordinationRequestToCoordina2021-09-30T17:08:57.1584224Z
> ##[error]No space left on device
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)