nmz0324 opened a new issue #5598:
URL: https://github.com/apache/dolphinscheduler/issues/5598


   Spark Program is running successfully on yarn, but some show success or 
failure on dolphin scheduler, version 1.3.5
   SPARK程序在yarn `上运行状态是成功,但在dolphinscheduler上有的显示成功,有的显示失败 dolphin 
scheduler版本1.3.5
   work.log
        
   21/06/07 17:25:10 INFO common.FileUtils: Creating directory if it doesn't 
exist: hdfs://master:8020/user/hive/warehouse/llys.db/d_meter_info
        21/06/07 17:25:10 INFO spark.SparkContext: Invoking stop() from 
shutdown hook
        21/06/07 17:25:10 INFO ui.SparkUI: Stopped Spark web UI at 
http://192.168.xxxxxxxx:4040
        21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Interrupting 
monitor thread
        21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Shutting 
down all executors
        21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Asking each 
executor to shut down
        21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Stopped
        21/06/07 17:25:10 INFO spark.MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
        21/06/07 17:25:10 INFO storage.MemoryStore: MemoryStore cleared
        21/06/07 17:25:10 INFO storage.BlockManager: BlockManager stopped
        21/06/07 17:25:10 INFO storage.BlockManagerMaster: BlockManagerMaster 
stopped
        21/06/07 17:25:10 INFO 
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
        21/06/07 17:25:10 INFO spark.SparkContext: Successfully stopped 
SparkContext
        21/06/07 17:25:10 INFO util.ShutdownHookManager: Shutdown hook called
        21/06/07 17:25:10 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-a62daf3e-951a-4906-8e69-2efcc7688362
        21/06/07 17:25:10 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-58e66c77-b8a1-40ef-a9bf-5d9ca39b418f
   [INFO] 2021-06-07 17:25:11.098  - [taskAppId=TASK-943-539-746]:[125] - 
FINALIZE_SESSION
   [INFO] 2021-06-07 17:25:11.109  - [taskAppId=TASK-943-539-746]:[431] - find 
app id: application_1623056438401_0003
   [INFO] 2021-06-07 17:25:11.113 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - task 
instance id : 746,task final status : FAILURE
   [INFO] 2021-06-07 17:25:11.116 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[161] - 
develop mode is: false
   [INFO] 2021-06-07 17:25:11.119 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[179] - exec 
local path: /tmp/dolphinscheduler/exec/process/61/943/539/746 cleared.
   
   yarn 
   
   
   Log Type: stderr
   
   Log Upload Time: Mon Jun 07 17:25:13 +0800 2021
   
   Log Length: 68880
   
   Showing 4096 bytes of 68880 total. Click here for the full log.
   
   _1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986 
timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
   21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources 
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: 
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
 } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
   21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources 
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: 
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
 } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
   21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources 
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: 
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
 } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
   21/06/07 17:24:54 INFO yarn.YarnAllocator: Completed container 
container_1623056438401_0003_01_000004 on host: worker01 (state: COMPLETE, exit 
status: 1)
   21/06/07 17:24:54 WARN yarn.YarnAllocator: Container marked as failed: 
container_1623056438401_0003_01_000004 on host: worker01. Exit status: 1. 
Diagnostics: Exception from container-launch.
   Container id: container_1623056438401_0003_01_000004
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
        at org.apache.hadoop.util.Shell.run(Shell.java:507)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   
   21/06/07 17:24:57 INFO yarn.YarnAllocator: Will request 1 executor 
container(s), each with 1 core(s) and 11264 MB memory (including 1024 MB of 
overhead)
   21/06/07 17:24:57 INFO yarn.YarnAllocator: Submitted 1 unlocalized container 
requests.
   21/06/07 17:24:59 INFO yarn.YarnAllocator: Launching container 
container_1623056438401_0003_01_000007 on host master
   21/06/07 17:24:59 INFO yarn.YarnAllocator: Received 1 containers from YARN, 
launching executors on 1 of them.
   21/06/07 17:24:59 INFO yarn.ExecutorRunnable: Preparing Local resources
   21/06/07 17:24:59 INFO yarn.ExecutorRunnable: Prepared Local resources 
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file: 
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
 } size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
   21/06/07 17:25:10 INFO yarn.YarnAllocator: Driver requested a total number 
of 0 executor(s).
   21/06/07 17:25:10 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated 
or disconnected! Shutting down. 192.168.xx.xx:60916
   21/06/07 17:25:10 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated 
or disconnected! Shutting down. worker03:60916
   21/06/07 17:25:10 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, 
exitCode: 0
   21/06/07 17:25:10 INFO yarn.ApplicationMaster: Unregistering 
ApplicationMaster with SUCCEEDED
   21/06/07 17:25:10 INFO impl.AMRMClientImpl: Waiting for application to be 
successfully unregistered.
   21/06/07 17:25:10 INFO yarn.ApplicationMaster: Deleting staging directory 
.sparkStaging/application_1623056438401_0003
   21/06/07 17:25:10 INFO util.ShutdownHookManager: Shutdown hook called
   
   
   
![微信图片_20210607175004](https://user-images.githubusercontent.com/73994723/120998171-68bd1a00-c7ba-11eb-8e03-d5e9f00cc59c.png)
   
![微信图片_20210607180117](https://user-images.githubusercontent.com/73994723/120998192-6e1a6480-c7ba-11eb-8ce2-79451723be55.png)
   
![微信图片_20210607180131](https://user-images.githubusercontent.com/73994723/120998198-707cbe80-c7ba-11eb-813a-d73672ab9439.png)
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to