[
https://issues.apache.org/jira/browse/FLINK-31278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696944#comment-17696944
]
Matthias Pohl commented on FLINK-31278:
---------------------------------------
{quote}
Could you elaborate, how disabling fork reuse would help?
{quote}
Disabling the fork reuse and disabling parallel execution would enable us to
identify the exact test that caused the OOM. It's not necessarily only the test
that was started last that could spoil the heap space. You're right with your
concern about test runtime. But it looks like the runtime for core will only
increase from ~40mins to 1h10mins based on the CI run I did in [this issue's
PR|https://github.com/apache/flink/pull/22052]. But I will have another look at
the {{MemoryExecutionGraphInfoStoreTest}} as well.
> exit code 137 (i.e. OutOfMemoryError) in core module
> ----------------------------------------------------
>
> Key: FLINK-31278
> URL: https://issues.apache.org/jira/browse/FLINK-31278
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Priority: Blocker
> Labels: pull-request-available, test-stability
>
> The following build failed due to a 137 exit code indicating an
> OutOfMemoryError:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46643&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=7847
> {code}
> [...]
> Mar 01 05:29:06 [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time
> elapsed: 0.65 s - in
> org.apache.flink.runtime.io.compression.BlockCompressionTest
> Mar 01 05:29:06 [INFO] Running
> org.apache.flink.runtime.dispatcher.DispatcherCachedOperationsHandlerTest
> Mar 01 05:29:07 [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time
> elapsed: 1.142 s - in
> org.apache.flink.runtime.dispatcher.DispatcherCachedOperationsHandlerTest
> Mar 01 05:29:08 [INFO] Running
> org.apache.flink.runtime.dispatcher.MemoryExecutionGraphInfoStoreTest
> ##[error]Exit code 137 returned from process: file name '/usr/bin/docker',
> arguments 'exec -i -u 1001 -w /home/vsts_azpcontainer
> 5953b171e8ed4caba7af2b326533e249211ed4dcc48640edb3c1b0cbbcdf1a21
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - core
> {code}
> This build ran on an Azure pipeline machine (Azure Pipelines 9) and,
> therefore, cannot be caused by FLINK-18356. That said, there was a concurrent
> 137 exit code build failure happening on agent "Azure Pipelines 21" (see
> [20230301.3|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46643&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=7847])
> ~10mins later
--
This message was sent by Atlassian Jira
(v8.20.10#820010)