Baogang Wang created SPARK-10145:
------------------------------------

             Summary: Executor exit without useful messages when spark runs in 
spark-streaming
                 Key: SPARK-10145
                 URL: https://issues.apache.org/jira/browse/SPARK-10145
             Project: Spark
          Issue Type: Bug
          Components: Streaming, YARN
         Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
cores and 32g memory  
            Reporter: Baogang Wang
            Priority: Critical


Each node is allocated 30g memory by Yarn.
My application receives messages from Kafka by directstream. Each application 
consists of 4 dstream window
Spark application is submitted by this command:
spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
--executor-memory 3g --num-executors 3 --executor-cores 4  --name 
safeSparkDealerUser --master yarn  --deploy-mode cluster  
spark_Security-1.0-SNAPSHOT.jar.nocalse 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties

After about 1 hours, some executor exits. There is no more yarn logs after the 
executor exits and there is no stack when the executor exits.
When I see the yarn node manager log, it shows as follows :


2015-08-17 17:25:41,550 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1439803298368_0005_01_000001 by user root
2015-08-17 17:25:41,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Creating a new application reference for app application_1439803298368_0005
2015-08-17 17:25:41,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root 
IP=172.19.160.102       OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1439803298368_0005    
CONTAINERID=container_1439803298368_0005_01_000001
2015-08-17 17:25:41,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1439803298368_0005 transitioned from NEW to INITING
2015-08-17 17:25:41,552 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1439803298368_0005_01_000001 to application 
application_1439803298368_0005
2015-08-17 17:25:41,557 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
disabled. The logs will be aggregated after this application is finished.
2015-08-17 17:25:41,663 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1439803298368_0005 transitioned from INITING to RUNNING
2015-08-17 17:25:41,664 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000001 transitioned from NEW to 
LOCALIZING
2015-08-17 17:25:41,664 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event CONTAINER_INIT for appId application_1439803298368_0005
2015-08-17 17:25:41,664 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing container container_1439803298368_0005_01_000001
2015-08-17 17:25:41,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar
 transitioned from INIT to DOWNLOADING
2015-08-17 17:25:41,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar
 transitioned from INIT to DOWNLOADING
2015-08-17 17:25:41,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_1439803298368_0005_01_000001
2015-08-17 17:25:41,668 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Writing credentials to the nmPrivate file 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_000001.tokens.
 Credentials list: 
2015-08-17 17:25:41,682 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
Initializing user root
2015-08-17 17:25:41,686 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
from 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_000001.tokens
 to 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/appcache/application_1439803298368_0005/container_1439803298368_0005_01_000001.tokens
2015-08-17 17:25:41,686 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer 
CWD set to 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/appcache/application_1439803298368_0005
 = 
file:/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/appcache/application_1439803298368_0005
2015-08-17 17:25:42,240 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar(->/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/filecache/14/spark-assembly-1.3.1-hadoop2.6.0.jar)
 transitioned from DOWNLOADING to LOCALIZED
2015-08-17 17:25:42,508 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar(->/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/filecache/15/spark_Security-1.0-SNAPSHOT.jar)
 transitioned from DOWNLOADING to LOCALIZED
2015-08-17 17:25:42,508 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000001 transitioned from LOCALIZING 
to LOCALIZED
2015-08-17 17:25:42,548 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000001 transitioned from LOCALIZED to 
RUNNING
................................................
2015-08-17 17:26:20,366 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1439803298368_0005_01_000003 by user root
2015-08-17 17:26:20,367 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1439803298368_0005_01_000003 to application 
application_1439803298368_0005
2015-08-17 17:26:20,368 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000003 transitioned from NEW to 
LOCALIZING
2015-08-17 17:26:20,368 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event CONTAINER_INIT for appId application_1439803298368_0005
2015-08-17 17:26:20,368 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing container container_1439803298368_0005_01_000003
2015-08-17 17:26:20,369 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000003 transitioned from LOCALIZING 
to LOCALIZED
2015-08-17 17:26:20,370 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root 
IP=172.19.160.102       OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1439803298368_0005    
CONTAINERID=container_1439803298368_0005_01_000003
2015-08-17 17:26:20,443 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000003 transitioned from LOCALIZED to 
RUNNING
2015-08-17 17:26:20,443 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Neither virutal-memory nor physical-memory monitoring is needed. Not running 
the monitor-thread
2015-08-17 17:26:20,449 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
launchContainer: [bash, 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/appcache/application_1439803298368_0005/container_1439803298368_0005_01_000003/default_container_executor.sh]
..........................................
   
2015-08-18 01:50:30,297 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Container container_1439803298368_0005_01_000003 succeeded 
2015-08-18 01:50:30,440 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000003 transitioned from RUNNING to 
EXITED_WITH_SUCCESS
2015-08-18 01:50:30,465 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1439803298368_0005_01_000003
2015-08-18 01:50:35,046 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root 
OPERATION=Container Finished - Succeeded        TARGET=ContainerImpl    
RESULT=SUCCESS  APPID=application_1439803298368_0005    
CONTAINERID=container_1439803298368_0005_01_000003
2015-08-18 01:50:35,062 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1439803298368_0005_01_000003 transitioned from 
EXITED_WITH_SUCCESS to DONE
2015-08-18 01:50:35,065 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Removing container_1439803298368_0005_01_000003 from application 
application_1439803298368_0005
2015-08-18 01:50:35,070 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Neither virutal-memory nor physical-memory monitoring is needed. Not running 
the monitor-thread
2015-08-18 01:50:35,082 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Considering container container_1439803298368_0005_01_000003 for 
log-aggregation
2015-08-18 01:50:35,089 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event CONTAINER_STOP for appId application_1439803298368_0005
2015-08-18 01:50:35,099 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Stopping container container_1439803298368_0005_01_000003
2015-08-18 01:50:35,105 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
absolute path : 
/export/servers/hadoop2.6.0/tmp/nm-local-dir/usercache/root/appcache/application_1439803298368_0005/container_1439803298368_0005_01_000003
2015-08-18 01:50:47,601 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
from container container_1439803298368_0005_01_000001 is : 15
2015-08-18 01:50:48,401 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
from container-launch with container ID: container_1439803298368_0005_01_000001 
and exit code: 15
ExitCodeException exitCode=15: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

    container_1439803298368_0005_01_000003 was started at 2015-08-17 17:26:20. 
It ran normally. But it transitioned  to succeed at  2015-08-18 01:50:30 . And 
it transitioned to CONTAINER_STOP in the end.    
container_1439803298368_0005_01_000001 was started at 2015-08-17 17:25:42. At 
2015-08-18 01:50:48 it exited suddenly.

According to the node manager ,we can know that 
container_1439803298368_0005_01_000003 transitioned from RUNNING to 
EXITED_WITH_SUCCESS




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to