github-actions[bot] commented on issue #7304: URL: https://github.com/apache/dolphinscheduler/issues/7304#issuecomment-990576082
### Search before asking -[X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened When my Dolphinscheduler executes Yarn tasks on cluster A via SSH to cluster B, dolphin will monitor the application_id, but this application_id cannot be found on cluster A (because it is on cluster B Yarn), which leads to some Some tasks will appear to show task errors, but actually complete normally. The log is as follows ``` [INFO] 2021-12-10 11:12:18.415-[taskAppId=TASK-118-6871-11452]:[138]--> SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cslc/apache-hive-2.0.0-bin/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cslc/apache-hive-2.0.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cslc/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] [INFO] 2021-12-10 11:12:25.417-[taskAppId=TASK-118-6871-11452]:[138]--> Logging initialized using configuration in file:/opt/cslc/apache-hive-2.0.0-bin/conf/hive-log4j2.properties [INFO] 2021-12-10 11:12:39.419-[taskAppId=TASK-118-6871-11452]:[138]--> OK Time taken: 3.464 seconds [INFO] 2021-12-10 11:12:41.420-[taskAppId=TASK-118-6871-11452]:[138]--> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (ie spark, tez) or using Hive 1.X releases. Query ID = dip_20211210111235_08db789d-05e7-43ab-aba9-f7984774e4d4 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> [INFO] 2021-12-10 11:12:42.421-[taskAppId=TASK-118-6871-11452]:[138]--> Starting Job = job_1631871075019_193985, Tracking URL = http://pdip002:8188/proxy/ application_1631871075019_193985/ Kill Command = /opt/cslc/hadoop-2.7.2/bin/hadoop job -kill job_1631871075019_193985 [INFO] 2021-12-10 11:12:52.423-[taskAppId=TASK-118-6871-11452]:[138]--> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2021-12-10 11:12:51,695 Stage-1 map = 0%, reduce = 0% [INFO] 2021-12-10 11:12:58.424-[taskAppId=TASK-118-6871-11452]:[138]--> 2021-12-10 11:12:58,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.57 sec [INFO] 2021-12-10 11:13:04.425-[taskAppId=TASK-118-6871-11452]:[138]--> 2021-12-10 11:13:04,406 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.01 sec [INFO] 2021-12-10 11:13:06.426-[taskAppId=TASK-118-6871-11452]:[138]--> MapReduce Total cumulative CPU time: 8 seconds 10 msec Ended Job = job_1631871075019_193985 [INFO] 2021-12-10 11:13:06.996-[taskAppId=TASK-118-6871-11452]:[447]-find app id: application_1631871075019_193985 [INFO] 2021-12-10 11:13:06.996-[taskAppId=TASK-118-6871-11452]:[404]-check yarn application status, appId:application_1631871075019_193985 [ERROR] 2021-12-10 11:13:07.014-[taskAppId=TASK-118-6871-11452]:[420]-yarn applications: application_1631871075019_193985, query status failed, exception:{} java.lang.NullPointerException: null at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:406) at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230) at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101) at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [INFO] 2021-12-10 11:13:07.014-[taskAppId=TASK-118-6871-11452]:[238]-process has exited, execute path:/cslc/dip001/dolphinscheduler_exec/exec/process/2/ 118/6871/11452, processId:40838 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0 [INFO] 2021-12-10 11:13:07.427-[taskAppId=TASK-118-6871-11452]:[138]--> MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.01 sec HDFS Read: 97430 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 10 msec OK 507 Time taken: 27.786 seconds, Fetched: 1 row(s) ``` ### What you expected to happen Shouldn't the status of Yarn task application_id be checked? ### How to reproduce Deploy Dolphinscheduler on cluster A and execute Yarn tasks on cluster B via SSH (two Yarns are deployed in cluster AB, and an example of ssh command: ssh user@host "command" ), dolphin will monitor application_id, but this application_id is in A It is not found on the cluster (because it is on the B cluster Yarn), which causes some tasks to display task errors, which actually complete normally. ### Anything else _No response_ ### Version 1.3.9 ### Are you willing to submit PR? -[X] Yes I am willing to submit a PR! ### Code of Conduct -[X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
