quanzhian opened a new issue #5461:
URL: https://github.com/apache/dolphinscheduler/issues/5461


   org.apache.dolphinscheduler.server.utils.ProcessUtil.java 类下
   
   
       public static void kill(TaskExecutionContext taskExecutionContext) {
           try {
               int processId = taskExecutionContext.getProcessId();
               if (processId == 0) {
                   logger.error("process kill failed, process id :{}, task 
id:{}",
                       processId, taskExecutionContext.getTaskInstanceId());
                   return;
               }
               // 此处的getPidsStr(processId)得到的进程PID有时候无法拿到,导致执行kill -9 
命令报错,需要官方进行一个空值判断
               String cmd = String.format("sudo kill -9 %s", 
getPidsStr(processId));
   
               logger.info("process id:{}, cmd:{}", processId, cmd);
   
               OSUtils.exeCmd(cmd);
   
           } catch (Exception e) {
               logger.error("kill task failed", e);
           }
           // find log and kill yarn job
           killYarnJob(taskExecutionContext);
       }
   
   
   
   异常日志信息如下:
   
   
   [INFO] 2021-05-12 18:28:32.942  - [taskAppId=TASK-107-71-109]:[347] - task 
run command:
   sudo -u dolphinscheduler sh 
/tmp/dolphinscheduler/exec/process/2/107/71/109/107_71_109.command
   [INFO] 2021-05-12 18:28:32.942  - [taskAppId=TASK-107-71-109]:[228] - 
process start, process id is: 11319
   [INFO] 2021-05-12 18:28:32.942  - [taskAppId=TASK-107-71-109]:[237] - 
process has exited, execute 
path:/tmp/dolphinscheduler/exec/process/2/107/71/109, processId:11319 
,exitStatusCode:0
   [ERROR] 2021-05-12 18:28:32.942  - [taskAppId=TASK-107-71-109]:[256] - 
process has failure , exitStatusCode : 0 , ready to kill ...
   [INFO] 2021-05-12 18:28:32.969 
org.apache.dolphinscheduler.server.utils.ProcessUtils:[373] - process id:11319, 
cmd:sudo kill -9 
   [ERROR] 2021-05-12 18:28:32.981 
org.apache.dolphinscheduler.server.utils.ProcessUtils:[378] - kill task failed
   org.apache.dolphinscheduler.common.shell.AbstractShell$ExitCodeException: 
   Usage:
    kill [options] <pid|name> [...]
   
   Options:
    -a, --all              do not restrict the name-to-pid conversion to 
processes
                           with the same uid as the present process
    -s, --signal <sig>     send specified signal
    -q, --queue <sig>      use sigqueue(2) rather than kill(2)
    -p, --pid              print pids without signaling them
    -l, --list [=<signal>] list signal names, or convert one to a name
    -L, --table            list signal names and numbers
   
    -h, --help     display this help and exit
    -V, --version  output version information and exit
   
   For more details see kill(1).
   
        at 
org.apache.dolphinscheduler.common.shell.AbstractShell.runCommand(AbstractShell.java:209)
        at 
org.apache.dolphinscheduler.common.shell.AbstractShell.run(AbstractShell.java:124)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execute(ShellExecutor.java:127)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execCommand(ShellExecutor.java:104)
        at 
org.apache.dolphinscheduler.common.shell.ShellExecutor.execCommand(ShellExecutor.java:87)
        at 
org.apache.dolphinscheduler.common.utils.OSUtils.exeShell(OSUtils.java:394)
        at 
org.apache.dolphinscheduler.common.utils.OSUtils.exeCmd(OSUtils.java:384)
        at 
org.apache.dolphinscheduler.server.utils.ProcessUtils.kill(ProcessUtils.java:375)
        at 
org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:257)
        at 
org.apache.dolphinscheduler.server.worker.task.qtdataIntegration.QtDiTask.handle(QtDiTask.java:166)
        at 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:134)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   [INFO] 2021-05-12 18:28:33.943  - [taskAppId=TASK-107-71-109]:[129] -  -> 
flinkx starting ...
        18:28:33.378 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: jobmanager.rpc.address, localhost
        18:28:33.381 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: jobmanager.rpc.port, 6123
        18:28:33.381 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: jobmanager.heap.size, 1024m
        18:28:33.381 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: taskmanager.heap.size, 1024m
        18:28:33.381 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
        18:28:33.381 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: parallelism.default, 1
   [INFO] 2021-05-12 18:28:33.982 
org.apache.dolphinscheduler.service.log.LogClientService:[100] - view log path 
/mnt/services/dolphinscheduler136/logs/107/71/109.log
   [INFO] 2021-05-12 18:28:33.988 
org.apache.dolphinscheduler.remote.NettyRemotingClient:[403] - netty client 
closed
   [INFO] 2021-05-12 18:28:33.988 
org.apache.dolphinscheduler.service.log.LogClientService:[59] - logger client 
closed
   [INFO] 2021-05-12 18:28:33.989 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[142] - task 
instance id : 109,task final status : FAILURE
   [INFO] 2021-05-12 18:28:33.989 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[162] - 
develop mode is: false
   [INFO] 2021-05-12 18:28:33.989 
org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[180] - exec 
local path: /tmp/dolphinscheduler/exec/process/2/107/71/109 cleared.
   [INFO] 2021-05-12 18:28:34.944  - [taskAppId=TASK-107-71-109]:[129] -  -> 
18:28:34.159 [main] INFO com.dtstack.flinkx.launcher.perjob.PerJobSubmitter - 
start to submit per-job task, LauncherOptions = Options{mode='yarnPer', 
job='/tmp/dolphinscheduler/exec/process/2/107/71/109/107_71_109_job.json', 
monitor='null', jobid='Flink Job', flinkconf='/mnt/services/flink-1.8.8/conf', 
pluginRoot='/mnt/services/flinkx/plugins', remotePluginPath='null', 
yarnconf='/etc/hadoop/conf', parallelism='1', priority='1', queue='default', 
flinkLibJar='/mnt/services/flink-1.8.8/lib', 
confProp='{"flink.checkpoint.interval":60000}', p='', s='null', 
pluginLoadMode='shipfile', appId='null'}
        18:28:34.167 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: jobmanager.rpc.address, localhost
        18:28:34.167 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: jobmanager.rpc.port, 6123
        18:28:34.167 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: jobmanager.heap.size, 1024m
        18:28:34.167 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: taskmanager.heap.size, 1024m
        18:28:34.167 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: taskmanager.numberOfTaskSlots, 1
        18:28:34.167 [main] INFO 
org.apache.flink.configuration.GlobalConfiguration - Loading configuration 
property: parallelism.default, 1
        18:28:34.305 [main] WARN org.apache.hadoop.util.NativeCodeLoader - 
Unable to load native-hadoop library for your platform... using builtin-java 
classes where applicable
        18:28:34.360 [main] INFO 
org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to 
dolphinscheduler (auth:SIMPLE)
        log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.yarn.ipc.YarnRPC).
        log4j:WARN Please initialize the log4j system properly.
        log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig 
for more info.
        18:28:34.543 [main] INFO 
com.dtstack.flinkx.launcher.perjob.PerJobClusterClientBuilder - ----init yarn 
success ----
        18:28:34.666 [main] INFO org.apache.hadoop.conf.Configuration - 
resource-types.xml not found
        18:28:34.666 [main] INFO 
org.apache.hadoop.yarn.util.resource.ResourceUtils - Unable to find 
'resource-types.xml'.
        18:28:34.704 [main] WARN 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - The JobManager or 
TaskManager memory is below the smallest possible YARN Container size. The 
value of 'yarn.scheduler.minimum-allocation-mb' is '1024'. Please increase the 
memory size.YARN will allocate the smaller containers but the scheduler will 
account for the minimum-allocation-mb, maybe not all instances you requested 
will start.
        18:28:34.704 [main] INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: 
ClusterSpecification{masterMemoryMB=1024, taskManagerMemoryMB=1024, 
numberTaskManagers=1, slotsPerTaskManager=1}
   [INFO] 2021-05-12 18:28:35.945  - [taskAppId=TASK-107-71-109]:[129] -  -> 
18:28:35.024 [main] WARN 
org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit 
local reads feature cannot be used because libhadoop cannot be loaded.
        18:28:35.033 [main] WARN 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration 
directory ('/mnt/services/flink-1.8.8/conf') contains both LOG4J and Logback 
configuration files. Please delete or rename one of them.
   [INFO] 2021-05-12 18:28:36.946  - [taskAppId=TASK-107-71-109]:[129] -  -> 
18:28:36.716 [main] INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - 
Submitting application master application_1609329939009_5348
        18:28:36.741 [main] INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
application_1609329939009_5348
        18:28:36.742 [main] INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster 
to be allocated
        18:28:36.744 [main] INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, 
current state ACCEPTED
   [INFO] 2021-05-12 18:28:40.946  - [taskAppId=TASK-107-71-109]:[129] -  -> 
18:28:40.025 [main] INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - 
YARN application has been deployed successfully.
        18:28:40.320 [main] INFO org.apache.flink.runtime.rest.RestClient - 
Rest client endpoint started.
        18:28:40.323 [main] INFO com.dtstack.flinkx.util.YarnUtil - 
HADOOP_CONF_DIR:/etc/hadoop/conf
        18:28:40.372 [main] INFO com.dtstack.flinkx.util.YarnUtil - get 1080 
config from /etc/hadoop/conf/core-site.xml
        18:28:40.380 [main] INFO com.dtstack.flinkx.util.YarnUtil - get 23 
config from /etc/hadoop/conf/hdfs-site.xml
        18:28:40.400 [main] INFO com.dtstack.flinkx.util.YarnUtil - hdfs 
path:hdfs:///apps/flinkx/2021-05-12/816d1ef47c5a5cbd5557580126b17f22
        18:28:40.401 [main] INFO com.dtstack.flinkx.util.YarnUtil - 
monitorUrl:bigdata-master01:8088/proxy/application_1609329939009_5348
        18:28:40.421 [main] INFO 
com.dtstack.flinkx.launcher.perjob.PerJobSubmitter - deploy per_job with appId: 
application_1609329939009_5348}, jobId: 816d1ef47c5a5cbd5557580126b17f22
   [INFO] 2021-05-12 18:28:40.947  - [taskAppId=TASK-107-71-109]:[127] - 
FINALIZE_SESSION
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to