nmz0324 opened a new issue #5598:
URL: https://github.com/apache/dolphinscheduler/issues/5598
Spark Program is running successfully on yarn, but some show success or
failure on dolphin scheduler, version 1.3.5
SPARK程序在yarn `上运行状态是成功,但在dolphinscheduler上有的显示成功,有的显示失败 dolphin
scheduler版本1.3.5
work.log
21/06/07 17:25:10 INFO common.FileUtils: Creating directory if it doesn't
exist: hdfs://master:8020/user/hive/warehouse/llys.db/d_meter_info
21/06/07 17:25:10 INFO spark.SparkContext: Invoking stop() from
shutdown hook
21/06/07 17:25:10 INFO ui.SparkUI: Stopped Spark web UI at
http://192.168.xxxxxxxx:4040
21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Interrupting
monitor thread
21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Shutting
down all executors
21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Asking each
executor to shut down
21/06/07 17:25:10 INFO cluster.YarnClientSchedulerBackend: Stopped
21/06/07 17:25:10 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
21/06/07 17:25:10 INFO storage.MemoryStore: MemoryStore cleared
21/06/07 17:25:10 INFO storage.BlockManager: BlockManager stopped
21/06/07 17:25:10 INFO storage.BlockManagerMaster: BlockManagerMaster
stopped
21/06/07 17:25:10 INFO
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
21/06/07 17:25:10 INFO spark.SparkContext: Successfully stopped
SparkContext
21/06/07 17:25:10 INFO util.ShutdownHookManager: Shutdown hook called
21/06/07 17:25:10 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-a62daf3e-951a-4906-8e69-2efcc7688362
21/06/07 17:25:10 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-58e66c77-b8a1-40ef-a9bf-5d9ca39b418f
[INFO] 2021-06-07 17:25:11.098 - [taskAppId=TASK-943-539-746]:[125] -
FINALIZE_SESSION
[INFO] 2021-06-07 17:25:11.109 - [taskAppId=TASK-943-539-746]:[431] - find
app id: application_1623056438401_0003
[INFO] 2021-06-07 17:25:11.113
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[141] - task
instance id : 746,task final status : FAILURE
[INFO] 2021-06-07 17:25:11.116
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[161] -
develop mode is: false
[INFO] 2021-06-07 17:25:11.119
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[179] - exec
local path: /tmp/dolphinscheduler/exec/process/61/943/539/746 cleared.
yarn
Log Type: stderr
Log Upload Time: Mon Jun 07 17:25:13 +0800 2021
Log Length: 68880
Showing 4096 bytes of 68880 total. Click here for the full log.
_1623056438401_0003/__spark_conf__8945468690948841211.zip" } size: 30986
timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file:
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
} size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file:
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
} size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:51 INFO yarn.ExecutorRunnable: Prepared Local resources
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file:
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
} size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:24:54 INFO yarn.YarnAllocator: Completed container
container_1623056438401_0003_01_000004 on host: worker01 (state: COMPLETE, exit
status: 1)
21/06/07 17:24:54 WARN yarn.YarnAllocator: Container marked as failed:
container_1623056438401_0003_01_000004 on host: worker01. Exit status: 1.
Diagnostics: Exception from container-launch.
Container id: container_1623056438401_0003_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
21/06/07 17:24:57 INFO yarn.YarnAllocator: Will request 1 executor
container(s), each with 1 core(s) and 11264 MB memory (including 1024 MB of
overhead)
21/06/07 17:24:57 INFO yarn.YarnAllocator: Submitted 1 unlocalized container
requests.
21/06/07 17:24:59 INFO yarn.YarnAllocator: Launching container
container_1623056438401_0003_01_000007 on host master
21/06/07 17:24:59 INFO yarn.YarnAllocator: Received 1 containers from YARN,
launching executors on 1 of them.
21/06/07 17:24:59 INFO yarn.ExecutorRunnable: Preparing Local resources
21/06/07 17:24:59 INFO yarn.ExecutorRunnable: Prepared Local resources
Map(__spark_conf__ -> resource { scheme: "hdfs" host: "master" port: 8020 file:
"/user/root/.sparkStaging/application_1623056438401_0003/__spark_conf__8945468690948841211.zip"
} size: 30986 timestamp: 1623057888483 type: ARCHIVE visibility: PRIVATE)
21/06/07 17:25:10 INFO yarn.YarnAllocator: Driver requested a total number
of 0 executor(s).
21/06/07 17:25:10 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated
or disconnected! Shutting down. 192.168.xx.xx:60916
21/06/07 17:25:10 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated
or disconnected! Shutting down. worker03:60916
21/06/07 17:25:10 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED,
exitCode: 0
21/06/07 17:25:10 INFO yarn.ApplicationMaster: Unregistering
ApplicationMaster with SUCCEEDED
21/06/07 17:25:10 INFO impl.AMRMClientImpl: Waiting for application to be
successfully unregistered.
21/06/07 17:25:10 INFO yarn.ApplicationMaster: Deleting staging directory
.sparkStaging/application_1623056438401_0003
21/06/07 17:25:10 INFO util.ShutdownHookManager: Shutdown hook called



--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]