lucyilys opened a new issue #1644:
URL: https://github.com/apache/incubator-linkis/issues/1644


   ### Search before asking
   
   - [X] I searched the 
[issues](https://github.com/apache/incubator-linkis/issues) and found no 
similar issues.
   
   
   ### Linkis Component
   
   linkis-cg-entrance, linkis-cg-manager, linkis-cg-engineConnplugin, 
linkis-cg-engineConnManager
   
   ### What happened + What you expected to happen
   
   2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task 
astJob_7_retry_1 
org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 
12003 ,desc: hadoop:9101_16 Failed  to async get EngineNode 
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 
,serviceKind: linkis-cg-entrance
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) 
~[linkis-common-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
 ~[linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
 [linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_171]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
   
   2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - 
TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to 
Failed.
   2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206                       ] 
o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job 
LINKISCLI_hadoop_spark_1 
   2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer          ] 
o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key 
hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1
   2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit 
execTask(astJob_7_stage_14) to running
   2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks 
execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received 
rootTaskResponseEvent astJob_7_job_14
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, 
minusNumber: 0
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => 
Failed.
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to 
remove from execTaskToExecutionTasks
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with 
execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed 
with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be 
cancelled,null)
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (41) [info] - 
taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled.
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is 
closing
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (63) [close] -  
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (40) [write] - 
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line 
log
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to 
ExecutionTask(execution_7) with status Failed
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to 
mark executionTask(execution_7) rootExecTask astJob_7_job_14 to  Completed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute 
ExecTask(astJob_7_stage_14)
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 
change status Inited => Failed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task 
completed, now remove from taskManager
   2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93       ] 
o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create 
engine hadoop:9101_14, can retry true
   2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to 
askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 
Failed  to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: 
Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', 
serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), 
owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization 
TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 
,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: 
linkis-cg-entrance
   2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received 
EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: 
ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 
   2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15      ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 1
   2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6       ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 0
   
   
   ### Relevent platform
   
   2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task 
astJob_7_retry_1 
org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 
12003 ,desc: hadoop:9101_16 Failed  to async get EngineNode 
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 
,serviceKind: linkis-cg-entrance
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) 
~[linkis-common-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
 ~[linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
 [linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_171]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
   
   2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - 
TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to 
Failed.
   2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206                       ] 
o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job 
LINKISCLI_hadoop_spark_1 
   2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer          ] 
o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key 
hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1
   2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit 
execTask(astJob_7_stage_14) to running
   2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks 
execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received 
rootTaskResponseEvent astJob_7_job_14
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, 
minusNumber: 0
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => 
Failed.
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to 
remove from execTaskToExecutionTasks
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with 
execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed 
with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be 
cancelled,null)
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (41) [info] - 
taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled.
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is 
closing
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (63) [close] -  
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (40) [write] - 
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line 
log
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to 
ExecutionTask(execution_7) with status Failed
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to 
mark executionTask(execution_7) rootExecTask astJob_7_job_14 to  Completed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute 
ExecTask(astJob_7_stage_14)
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 
change status Inited => Failed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task 
completed, now remove from taskManager
   2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93       ] 
o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create 
engine hadoop:9101_14, can retry true
   2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to 
askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 
Failed  to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: 
Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', 
serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), 
owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization 
TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 
,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: 
linkis-cg-entrance
   2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received 
EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: 
ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 
   2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15      ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 1
   2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6       ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 0
   
   
   ### Reproduction script
   
   2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task 
astJob_7_retry_1 
org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 
12003 ,desc: hadoop:9101_16 Failed  to async get EngineNode 
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 
,serviceKind: linkis-cg-entrance
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) 
~[linkis-common-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
 ~[linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
 [linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_171]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
   
   2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - 
TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to 
Failed.
   2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206                       ] 
o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job 
LINKISCLI_hadoop_spark_1 
   2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer          ] 
o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key 
hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1
   2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit 
execTask(astJob_7_stage_14) to running
   2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks 
execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received 
rootTaskResponseEvent astJob_7_job_14
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, 
minusNumber: 0
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => 
Failed.
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to 
remove from execTaskToExecutionTasks
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with 
execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed 
with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be 
cancelled,null)
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (41) [info] - 
taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled.
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is 
closing
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (63) [close] -  
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (40) [write] - 
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line 
log
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to 
ExecutionTask(execution_7) with status Failed
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to 
mark executionTask(execution_7) rootExecTask astJob_7_job_14 to  Completed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute 
ExecTask(astJob_7_stage_14)
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 
change status Inited => Failed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task 
completed, now remove from taskManager
   2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93       ] 
o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create 
engine hadoop:9101_14, can retry true
   2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to 
askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 
Failed  to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: 
Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', 
serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), 
owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization 
TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 
,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: 
linkis-cg-entrance
   2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received 
EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: 
ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 
   2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15      ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 1
   2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6       ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 0
   
   
   ### Anything else
   
   2022-03-04 15:34:20.725 [ERROR] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (79) [run] - Failed to execute task 
astJob_7_retry_1 
org.apache.linkis.orchestrator.ecm.exception.ECMPluginErrorException: errCode: 
12003 ,desc: hadoop:9101_16 Failed  to async get EngineNode 
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.linkis.common.utils.Utils$.aux$1(Utils.scala:191)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:199)
        at org.apache.linkis.common.utils.Utils$.waitUntil(Utils.scala:202)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply$mcV$sp(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap$$anonfun$getAndRemove$1.apply(EngineAsyncResponseCache.scala:80)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.ecm.cache.EngineAsyncResponseCacheMap.getAndRemove(EngineAsyncResponseCache.scala:81)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:156)
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40)
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) ,ip: hadoop ,port: 9104 
,serviceKind: linkis-cg-entrance
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.getEngineNodeAskManager(ComputationEngineConnManager.scala:165)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.ComputationEngineConnManager.askEngineConnExecutor(ComputationEngineConnManager.scala:101)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.ecm.AbstractEngineConnManager.getAvailableEngineConnExecutor(EngineConnManager.scala:132)
 ~[linkis-orchestrator-ecm-plugin-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.createExecutor(DefaultCodeExecTaskExecutorManager.scala:115)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.execute.DefaultCodeExecTaskExecutorManager.askExecutor(DefaultCodeExecTaskExecutorManager.scala:91)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask$$anonfun$execute$1.apply(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at org.apache.linkis.common.utils.Utils$.tryCatch(Utils.scala:40) 
~[linkis-common-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.computation.physical.CodeLogicalUnitExecTask.execute(CodeLogicalUnitExecTask.scala:69)
 ~[linkis-computation-orchestrator-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.plans.physical.RetryExecTask.execute(RetryExecTask.scala:62)
 ~[linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
org.apache.linkis.orchestrator.strategy.async.AsyncExecTaskRunnerImpl.run(AsyncExecTaskRunnerImpl.scala:62)
 [linkis-orchestrator-core-1.0.3.jar:1.0.3]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_171]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_171]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_171]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
   
   2022-03-04 15:34:20.725 [INFO ] [BaseTaskScheduler-Thread-71             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (86) [transientStatus] - 
TaskastJob_7_retry_1 status flip error! Cause: Failed to flip from Cancelled to 
Failed.
   2022-03-04 15:34:20.734 [INFO ] [qtp1277477898-206                       ] 
o.a.l.e.r.EntranceRestfulApi (407) [kill] - end to kill job 
LINKISCLI_hadoop_spark_1 
   2022-03-04 15:34:20.743 [INFO ] [CodeReheaterNotifyTaskConsumer          ] 
o.a.l.o.s.a.AsyncTaskManager (177) [apply] - user key 
hadoop-LINKISCLI,spark-2.4.3, executionTaskId execution_7 to addNumber: 1
   2022-03-04 15:34:20.743 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (41) [info] - ExecTaskRunner Submit 
execTask(astJob_7_stage_14) to running
   2022-03-04 15:34:20.744 [ERROR] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.GatherStrategyStageInfoExecTask (62) [error] - There are Tasks 
execution failure of stage astJob_7_stage_14, now mark ExecutionTask as failed
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (341) [onRootTaskResponseEvent] - received 
rootTaskResponseEvent astJob_7_job_14
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (320) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (336) [clearExecutionTask] - 
executionTask(execution_7) finished user key hadoop-LINKISCLI,spark-2.4.3, 
minusNumber: 0
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - execution_7 change status Inited => 
Failed.
   2022-03-04 15:34:20.744 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.ExecutionImpl (41) [info] - astJob_7_job_14 completed, Now to 
remove from execTaskToExecutionTasks
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.e.EngineExecuteAsyncReturn (41) [info] - Job with 
execId-LINKISCLI_hadoop_spark_1 and subJobId : 17 from orchestrator completed 
with state ErrorExecuteResponse(21304, Task is Failed,errorMsg: Job be 
cancelled,null)
   2022-03-04 15:34:20.758 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (41) [info] - 
taskID:17execID:LINKISCLI_hadoop_spark_1 change status Running => Cancelled.
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.j.EntranceExecutionJob (334) [close] - job:LINKISCLI_hadoop_spark_1 is 
closing
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (63) [close] -  
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log logWriter close
   2022-03-04 15:34:20.780 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.e.l.CacheLogWriter (40) [write] - 
hdfs:///tmp/linkis/log/2022-03-04/LINKISCLI/hadoop/17.log write first one line 
log
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.e.i.BaseExecutionTask (41) [info] - Finished to 
ExecutionTask(execution_7) with status Failed
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (361) [markExecutionTaskCompleted] - Finished to 
mark executionTask(execution_7) rootExecTask astJob_7_job_14 to  Completed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (71) [run] - Failed to execute 
ExecTask(astJob_7_stage_14)
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncExecTaskRunnerImpl (90) [transientStatus] - astJob_7_stage_14 
change status Inited => Failed.
   2022-03-04 15:34:20.801 [INFO ] [BaseTaskScheduler-Thread-74             ] 
o.a.l.o.s.a.AsyncTaskManager (204) [addCompletedTask] - astJob_7_stage_14 task 
completed, now remove from taskManager
   2022-03-04 15:34:21.891 [INFO ] [RpcMessageScheduler-ThreadPool-93       ] 
o.a.l.o.e.s.i.DefaultEngineAsyncResponseService (41) [info] - Failed to create 
engine hadoop:9101_14, can retry true
   2022-03-04 15:34:23.100 [WARN ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (50) [warn] - mark_2 Failed to 
askEngineAskRequest time taken (661740), errCode: 12003 ,desc: hadoop:9101_14 
Failed  to async get EngineNode LinkisRetryException: errCode: 30001 ,desc: 
Waiting for engineNode:AMEngineNode{nodeStatus=null, lock='null', 
serviceInstance=ServiceInstance(linkis-cg-engineconn, hadoop:39841), 
owner='hadoop'}(0df6e0e8-461a-432c-8a63-3042265f4a1b) initialization 
TimeoutException, already waiting 660000 ms ,ip: hadoop ,port: 9101 
,serviceKind: linkis-cg-linkismanager ,ip: hadoop ,port: 9104 ,serviceKind: 
linkis-cg-entrance
   2022-03-04 15:34:23.157 [INFO ] [BaseTaskScheduler-Thread-65             ] 
o.a.l.o.e.ComputationEngineConnManager (41) [info] - mark_2 received 
EngineAskAsyncResponse id: hadoop:9101_21 serviceInstance: 
ServiceInstance(linkis-cg-linkismanager, hadoop:9101) 
   2022-03-04 15:34:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-15      ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 1
   2022-03-04 15:35:50.802 [INFO ] [Linkis-Default-Scheduler-Thread-6       ] 
o.a.l.o.e.i.BaseTaskScheduler (41) [info] - Clear finished task from  
taskFutureCache size 0
   
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@linkis.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@linkis.apache.org
For additional commands, e-mail: dev-h...@linkis.apache.org

Reply via email to