[GitHub] [hudi] kingkongpoon opened a new issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

GitBox Tue, 09 Feb 2021 01:23:18 -0800


kingkongpoon opened a new issue #2557:
URL: https://github.com/apache/hudi/issues/2557



   I  run my process with spark on yarn,
   the table type I have tried COW and MOR,
   When I first run the cow table (SaveMode.Overwrite), it's very fast.(about 
700MB data in hdfs)
   but when I run an increment(SaveMode.Append), it's very slowly,and throw 
error
   like
   ```
   Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
        at org.apache.hadoop.util.Shell.run(Shell.java:869)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   ```
   21/02/09 16:52:21 ERROR [dispatcher-event-loop-11] YarnScheduler: Lost 
executor 10 on node1: Container from a bad node: 
container_e10_1610102487810_33748_01_000012 on host: node1. Exit status: 137. 
Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/09 16:52:23 ERROR [dispatcher-event-loop-11] YarnScheduler: Lost 
executor 4 on node1: Container from a bad node: 
container_e10_1610102487810_33748_01_000005 on host: node1. Exit status: 1. 
Diagnostics: Exception from container-launch.
   Container id: container_e10_1610102487810_33748_01_000005
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
        at org.apache.hadoop.util.Shell.run(Shell.java:869)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   ```
   and my cluster's total memory is about 40GB 
   
   I run with this conf
   spark-submit --master yarn --driver-memory 4G --executor-memory 8G 
--executor-cores 4 --num-executors 10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] kingkongpoon opened a new issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Reply via email to