[
https://issues.apache.org/jira/browse/FLINK-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900419#comment-14900419
]
Robert Metzger commented on FLINK-2392:
---------------------------------------
This one failed with the following issue: The TaskManager had an OutOfMemory
exception:
{code}
16:58:16,631 ERROR org.apache.flink.runtime.taskmanager.TaskManager
- Error while starting up taskManager
java.lang.Exception: OutOfMemory error (Java heap space) while allocating the
TaskManager heap memory (17973782 bytes).
at
org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1651)
at
org.apache.flink.runtime.taskmanager.TaskManager$.runTaskManager(TaskManager.scala:1460)
at
org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1326)
at
org.apache.flink.runtime.taskmanager.TaskManager.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala)
at
org.apache.flink.yarn.appMaster.YarnTaskManagerRunner$1.run(YarnTaskManagerRunner.java:99)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at
org.apache.flink.yarn.appMaster.YarnTaskManagerRunner.main(YarnTaskManagerRunner.java:95)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.flink.runtime.memory.MemoryManager$HeapMemoryPool.<init>(MemoryManager.java:611)
at
org.apache.flink.runtime.memory.MemoryManager.<init>(MemoryManager.java:163)
at
org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1640)
... 8 more
16:58:16,634 ERROR org.apache.flink.yarn.appMaster.YarnTaskManagerRunner
- Error while starting the TaskManager
java.lang.Exception: OutOfMemory error (Java heap space) while allocating the
TaskManager heap memory (17973782 bytes).
at
org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1651)
at
org.apache.flink.runtime.taskmanager.TaskManager$.runTaskManager(TaskManager.scala:1460)
at
org.apache.flink.runtime.taskmanager.TaskManager$.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala:1326)
at
org.apache.flink.runtime.taskmanager.TaskManager.selectNetworkInterfaceAndRunTaskManager(TaskManager.scala)
at
org.apache.flink.yarn.appMaster.YarnTaskManagerRunner$1.run(YarnTaskManagerRunner.java:99)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at
org.apache.flink.yarn.appMaster.YarnTaskManagerRunner.main(YarnTaskManagerRunner.java:95)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.flink.runtime.memory.MemoryManager$HeapMemoryPool.<init>(MemoryManager.java:611)
at
org.apache.flink.runtime.memory.MemoryManager.<init>(MemoryManager.java:163)
at
org.apache.flink.runtime.taskmanager.TaskManager$.startTaskManagerComponentsAndActor(TaskManager.scala:1640)
... 8 more
16:58:16,639 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator
- Shutting down remote daemon.
{code}
> Instable test in flink-yarn-tests
> ---------------------------------
>
> Key: FLINK-2392
> URL: https://issues.apache.org/jira/browse/FLINK-2392
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Reporter: Matthias J. Sax
> Assignee: Robert Metzger
> Priority: Critical
> Labels: test-stability
>
> The test YARNSessionFIFOITCase fails from time to time on an irregular basis.
> For example see: https://travis-ci.org/apache/flink/jobs/72019690
> {noformat}
> Tests run: 12, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 205.163 sec
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> perJobYarnClusterWithParallelism(org.apache.flink.yarn.YARNSessionFIFOITCase)
> Time elapsed: 60.651 sec <<< FAILURE!
> java.lang.AssertionError: During the timeout period of 60 seconds the
> expected string did not show up
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:478)
> at
> org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnClusterWithParallelism(YARNSessionFIFOITCase.java:435)
> Results :
> Failed tests:
>
> YARNSessionFIFOITCase.perJobYarnClusterWithParallelism:435->YarnTestBase.runWithArgs:478
> During the timeout period of 60 seconds the expected string did not show up
> {noformat}
> Another error case is this (see
> https://travis-ci.org/mjsax/flink/jobs/77313444)
> {noformat}
> Tests run: 12, Failures: 3, Errors: 0, Skipped: 2, Time elapsed: 182.008 sec
> <<< FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
> testTaskManagerFailure(org.apache.flink.yarn.YARNSessionFIFOITCase) Time
> elapsed: 27.356 sec <<< FAILURE!
> java.lang.AssertionError: Found a file
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_000003/taskmanager.log
> with a prohibited string: [Exception, Started
> [email protected]:8081]
> at org.junit.Assert.fail(Assert.java:88)
> at
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
> at
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testNonexistingQueue(org.apache.flink.yarn.YARNSessionFIFOITCase) Time
> elapsed: 17.421 sec <<< FAILURE!
> java.lang.AssertionError: Found a file
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_000003/taskmanager.log
> with a prohibited string: [Exception, Started
> [email protected]:8081]
> at org.junit.Assert.fail(Assert.java:88)
> at
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
> at
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> testJavaAPI(org.apache.flink.yarn.YARNSessionFIFOITCase) Time elapsed:
> 11.984 sec <<< FAILURE!
> java.lang.AssertionError: Found a file
> /home/travis/build/mjsax/flink/flink-yarn-tests/target/flink-yarn-tests-fifo/flink-yarn-tests-fifo-logDir-nm-0_0/application_1440595422559_0007/container_1440595422559_0007_01_000003/taskmanager.log
> with a prohibited string: [Exception, Started
> [email protected]:8081]
> at org.junit.Assert.fail(Assert.java:88)
> at
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:294)
> at
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:94)
> {noformat}
> Furthermore, this build failed too:
> https://travis-ci.org/apache/flink/jobs/77313450
> (no error, but Travis terminated to due no progress for 300 seconds ->
> deadlock?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)