[ 
https://issues.apache.org/jira/browse/FLINK-25426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465131#comment-17465131
 ] 

Yun Gao commented on FLINK-25426:
---------------------------------

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28553&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae&l=14634

> UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint fails on 
> AZP because it cannot allocate enough network buffers
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25426
>                 URL: https://issues.apache.org/jira/browse/FLINK-25426
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.15.0
>            Reporter: Till Rohrmann
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.15.0
>
>
> The test 
> {{UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint}} fails 
> with
> {code}
> 2021-12-23T02:54:46.2862342Z Dec 23 02:54:46 [ERROR] 
> UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint  Time 
> elapsed: 2.992 s  <<< ERROR!
> 2021-12-23T02:54:46.2865774Z Dec 23 02:54:46 java.lang.OutOfMemoryError: 
> Could not allocate enough memory segments for NetworkBufferPool (required 
> (Mb): 64, allocated (Mb): 14, missing (Mb): 50). Cause: Direct buffer memory. 
> The direct out-of-memory error has occurred. This can mean two things: either 
> job(s) require(s) a larger size of JVM direct memory or there is a direct 
> memory leak. The direct memory can be allocated by user code or some of its 
> dependencies. In this case 'taskmanager.memory.task.off-heap.size' 
> configuration option should be increased. Flink framework and its 
> dependencies also consume the direct memory, mostly for network 
> communication. The most of network memory is managed by Flink and should not 
> result in out-of-memory error. In certain special cases, in particular for 
> jobs with high parallelism, the framework may require more direct memory 
> which is not managed by Flink. In this case 
> 'taskmanager.memory.framework.off-heap.size' configuration option should be 
> increased. If the error persists then there is probably a direct memory leak 
> in user code or some of its dependencies which has to be investigated and 
> fixed. The task executor has to be shutdown...
> 2021-12-23T02:54:46.2868239Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.<init>(NetworkBufferPool.java:138)
> 2021-12-23T02:54:46.2868975Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createNettyShuffleEnvironment(NettyShuffleServiceFactory.java:140)
> 2021-12-23T02:54:46.2869771Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createNettyShuffleEnvironment(NettyShuffleServiceFactory.java:94)
> 2021-12-23T02:54:46.2870550Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createShuffleEnvironment(NettyShuffleServiceFactory.java:79)
> 2021-12-23T02:54:46.2871312Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.io.network.NettyShuffleServiceFactory.createShuffleEnvironment(NettyShuffleServiceFactory.java:58)
> 2021-12-23T02:54:46.2872062Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.taskexecutor.TaskManagerServices.createShuffleEnvironment(TaskManagerServices.java:414)
> 2021-12-23T02:54:46.2872767Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.taskexecutor.TaskManagerServices.fromConfiguration(TaskManagerServices.java:282)
> 2021-12-23T02:54:46.2873436Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:523)
> 2021-12-23T02:54:46.2877615Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.minicluster.MiniCluster.startTaskManager(MiniCluster.java:645)
> 2021-12-23T02:54:46.2878247Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.minicluster.MiniCluster.startTaskManagers(MiniCluster.java:626)
> 2021-12-23T02:54:46.2878856Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:379)
> 2021-12-23T02:54:46.2879487Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.testutils.MiniClusterResource.startMiniCluster(MiniClusterResource.java:209)
> 2021-12-23T02:54:46.2880152Z Dec 23 02:54:46  at 
> org.apache.flink.runtime.testutils.MiniClusterResource.before(MiniClusterResource.java:95)
> 2021-12-23T02:54:46.2880821Z Dec 23 02:54:46  at 
> org.apache.flink.test.util.MiniClusterWithClientResource.before(MiniClusterWithClientResource.java:64)
> 2021-12-23T02:54:46.2881519Z Dec 23 02:54:46  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:151)
> 2021-12-23T02:54:46.2882310Z Dec 23 02:54:46  at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointRescaleITCase.shouldRescaleUnalignedCheckpoint(UnalignedCheckpointRescaleITCase.java:534)
> 2021-12-23T02:54:46.2882978Z Dec 23 02:54:46  at 
> jdk.internal.reflect.GeneratedMethodAccessor123.invoke(Unknown Source)
> 2021-12-23T02:54:46.2883574Z Dec 23 02:54:46  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2021-12-23T02:54:46.2884171Z Dec 23 02:54:46  at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> 2021-12-23T02:54:46.2884732Z Dec 23 02:54:46  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 2021-12-23T02:54:46.2885527Z Dec 23 02:54:46  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2021-12-23T02:54:46.2886135Z Dec 23 02:54:46  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 2021-12-23T02:54:46.2886755Z Dec 23 02:54:46  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2021-12-23T02:54:46.2887387Z Dec 23 02:54:46  at 
> org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
> 2021-12-23T02:54:46.2887892Z Dec 23 02:54:46  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 2021-12-23T02:54:46.2888435Z Dec 23 02:54:46  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 2021-12-23T02:54:46.2889007Z Dec 23 02:54:46  at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> 2021-12-23T02:54:46.2889568Z Dec 23 02:54:46  at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> 2021-12-23T02:54:46.2890104Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2021-12-23T02:54:46.2890686Z Dec 23 02:54:46  at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> 2021-12-23T02:54:46.2891259Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> 2021-12-23T02:54:46.2891819Z Dec 23 02:54:46  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> 2021-12-23T02:54:46.2892421Z Dec 23 02:54:46  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> 2021-12-23T02:54:46.2892978Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 2021-12-23T02:54:46.2893508Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 2021-12-23T02:54:46.2894049Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 2021-12-23T02:54:46.2894588Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 2021-12-23T02:54:46.2895203Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> 2021-12-23T02:54:46.2895721Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> 2021-12-23T02:54:46.2896304Z Dec 23 02:54:46  at 
> org.junit.runners.Suite.runChild(Suite.java:128)
> 2021-12-23T02:54:46.2896781Z Dec 23 02:54:46  at 
> org.junit.runners.Suite.runChild(Suite.java:27)
> 2021-12-23T02:54:46.2897359Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> 2021-12-23T02:54:46.2897892Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> 2021-12-23T02:54:46.2898429Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> 2021-12-23T02:54:46.2898968Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> 2021-12-23T02:54:46.2899487Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> 2021-12-23T02:54:46.2900025Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> 2021-12-23T02:54:46.2900542Z Dec 23 02:54:46  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> 2021-12-23T02:54:46.2901044Z Dec 23 02:54:46  at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> 2021-12-23T02:54:46.2901540Z Dec 23 02:54:46  at 
> org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> 2021-12-23T02:54:46.2902086Z Dec 23 02:54:46  at 
> org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
> 2021-12-23T02:54:46.2902702Z Dec 23 02:54:46  at 
> org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
> 2021-12-23T02:54:46.2903297Z Dec 23 02:54:46  at 
> org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
> 2021-12-23T02:54:46.2903944Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
> 2021-12-23T02:54:46.2904712Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
> 2021-12-23T02:54:46.2905493Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
> 2021-12-23T02:54:46.2906245Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
> 2021-12-23T02:54:46.2906968Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
> 2021-12-23T02:54:46.2907692Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
> 2021-12-23T02:54:46.2908303Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86)
> 2021-12-23T02:54:46.2908971Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86)
> 2021-12-23T02:54:46.2909664Z Dec 23 02:54:46  at 
> org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53)
> 2021-12-23T02:54:46.2910347Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.execute(JUnitPlatformProvider.java:188)
> 2021-12-23T02:54:46.2911042Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
> 2021-12-23T02:54:46.2911743Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:124)
> 2021-12-23T02:54:46.2912399Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
> 2021-12-23T02:54:46.2913009Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
> 2021-12-23T02:54:46.2913589Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
> 2021-12-23T02:54:46.2914162Z Dec 23 02:54:46  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28502&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae&l=14634
> Maybe the test instability is caused by exceeding our available memory on the 
> CI machines by running too many tests concurrently.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to