lucyilys opened a new issue #1644: URL: https://github.com/apache/incubator-linkis/issues/1644
### Search before asking - [X] I searched the [issues](https://github.com/apache/incubator-linkis/issues) and found no similar issues. ### Linkis Component linkis-cg-entrance, linkis-cg-manager, linkis-cg-engineConnplugin, linkis-cg-engineConnManager ### What happened + What you expected to happen 2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task astJob_7_retry_1 org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 12003 ,desc: hadoop:9101_16 Failed to async get EngineNode java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) ~[linkis-common-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) ~[linkis-orchestrator-core-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) [linkis-orchestrator-core-1.0.3.jar:1.0.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] 2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to Failed. 2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206 ] o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job LINKISCLI_hadoop_spark_1 2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer ] o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1 2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit execTask(astJob_7_stage_14) to running 2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received rootTaskResponseEvent astJob_7_job_14 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, minusNumber: 0 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => Failed. 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to remove from execTaskToExecutionTasks 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be cancelled,null) 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (41) [info] - taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled. 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is closing 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (63) [close] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (40) [write] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line log 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to ExecutionTask(execution_7) with status Failed 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to mark executionTask(execution_7) rootExecTask astJob_7_job_14 to Completed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute ExecTask(astJob_7_stage_14) 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 change status Inited => Failed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task completed, now remove from taskManager 2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93 ] o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create engine hadoop:9101_14, can retry true 2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 Failed to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance 2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 1 2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 0 ### Relevent platform 2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task astJob_7_retry_1 org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 12003 ,desc: hadoop:9101_16 Failed to async get EngineNode java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) ~[linkis-common-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) ~[linkis-orchestrator-core-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) [linkis-orchestrator-core-1.0.3.jar:1.0.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] 2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to Failed. 2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206 ] o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job LINKISCLI_hadoop_spark_1 2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer ] o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1 2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit execTask(astJob_7_stage_14) to running 2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received rootTaskResponseEvent astJob_7_job_14 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, minusNumber: 0 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => Failed. 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to remove from execTaskToExecutionTasks 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be cancelled,null) 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (41) [info] - taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled. 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is closing 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (63) [close] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (40) [write] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line log 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to ExecutionTask(execution_7) with status Failed 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to mark executionTask(execution_7) rootExecTask astJob_7_job_14 to Completed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute ExecTask(astJob_7_stage_14) 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 change status Inited => Failed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task completed, now remove from taskManager 2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93 ] o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create engine hadoop:9101_14, can retry true 2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 Failed to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance 2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 1 2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 0 ### Reproduction script 2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task astJob_7_retry_1 org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 12003 ,desc: hadoop:9101_16 Failed to async get EngineNode java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) ~[linkis-common-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) ~[linkis-orchestrator-core-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) [linkis-orchestrator-core-1.0.3.jar:1.0.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] 2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to Failed. 2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206 ] o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job LINKISCLI_hadoop_spark_1 2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer ] o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1 2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit execTask(astJob_7_stage_14) to running 2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received rootTaskResponseEvent astJob_7_job_14 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, minusNumber: 0 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => Failed. 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to remove from execTaskToExecutionTasks 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be cancelled,null) 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (41) [info] - taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled. 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is closing 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (63) [close] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (40) [write] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line log 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to ExecutionTask(execution_7) with status Failed 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to mark executionTask(execution_7) rootExecTask astJob_7_job_14 to Completed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute ExecTask(astJob_7_stage_14) 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 change status Inited => Failed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task completed, now remove from taskManager 2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93 ] o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create engine hadoop:9101_14, can retry true 2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 Failed to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance 2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 1 2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 0 ### Anything else 2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task astJob_7_retry_1 org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 12003 ,desc: hadoop:9101_16 Failed to async get EngineNode java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199) at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156) at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132) ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) ~[linkis-common-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69) ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62) ~[linkis-orchestrator-core-1.0.3.jar:1.0.3] at org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62) [linkis-orchestrator-core-1.0.3.jar:1.0.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171] 2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to Failed. 2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206 ] o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job LINKISCLI_hadoop_spark_1 2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer ] o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1 2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit execTask(astJob_7_stage_14) to running 2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received rootTaskResponseEvent astJob_7_job_14 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, minusNumber: 0 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => Failed. 2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to remove from execTaskToExecutionTasks 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be cancelled,null) 2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (41) [info] - taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled. 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is closing 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (63) [close] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close 2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.e.l.CacheLogWriter (40) [write] - hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line log 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to ExecutionTask(execution_7) with status Failed 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to mark executionTask(execution_7) rootExecTask astJob_7_job_14 to Completed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute ExecTask(astJob_7_stage_14) 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 change status Inited => Failed. 2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74 ] o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task completed, now remove from taskManager 2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93 ] o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create engine hadoop:9101_14, can retry true 2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 Failed to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 ,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: linkis-cg-entrance 2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65 ] o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 1 2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6 ] o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from taskFutureCache size 0 ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@linkis.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@linkis.apache.org For additional commands, e-mail: dev-h...@linkis.apache.org