[ 
https://issues.apache.org/jira/browse/HADOOP-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707242#action_12707242
 ] 

Vinod K V commented on HADOOP-5420:
-----------------------------------

Looked at the patch. Some comments:
 - The patch needs sync-up with the latest patch on HADOOP-5771
 - Overall, I think the terms terminate and kill are better than the current 
usage of kill and destroy.

TaskController.java
 - killTask, destroyTask can better be terminateTask, killTask. And accordingly 
javadoc should be more explicit.
 - killTaskJVM: Null check of context.task belongs to LinuxTaskController, need 
it to be moved there.

DefaultTaskController.java
 - For WINDOWS process.destroy() is again not needed in destroyTask()

LinuxTaskController.java
 - We completely bring down the TT if we fail to write the pidFile to any one 
of the disks. This should change. We should fail only if we cannot write to any 
of the disks.
 - javadoc of setup() is incomplete. It should also quote the additional steps 
of writing TT pid-file.
 - Log message at +384 can be better: LOG.info("Written :: " + pid + "  to file 
" + ttPidFile);
 - TT's pidfile should not be made world writable (changeDirectoryPermissions 
does this). In fact, it can be set 000 permissions right-away. This would still 
work as we run as root to read it.
 - killHelper can better be something like finishTask()
 - What should we do when the commands for SIGTERM/SIGKILL fail?
 - buildTaskControllerExecutor() : For taskControllerCmd, you can instead use 
an Array
                                                            Why the usage of 
command.ordinal() in this method?
 - I think it would be clear to have a separate buildKillTaskArgs(). It would 
definitely help to also 'javadoc' the task-controller's command-line syntax for 
buildLaunchTaskArgs and this new buildKillTaskArgs(). With this you can 
completely remove the killHelper method()

ProcessTree.java
 - killProcessGroup() can be renamed to terminateProcessGroup(), killProcess() 
to terminateProcess(), sigKillProcessGroup to killProcessGroup() and 
sigKillProcess to killProcess().
 - Also, I think, with these semantics, we need a 
ProcessTree.isProcessGroupAlive(pgrpid) along with the current 
ProcessTree.isAlive(pid)

main.c
 - The argc length check should be command specific.
 - job_pid var can better be task_pid
 - You have a printf statement in the case DESTROY_TASK_JVM. This can be 
removed or redirected to log if needed.
 - Return values of verify_parent like UNABLE_TO_READ_TT_PID_FILE, 
OUT_OF_MEMORY are lost, as we are only returning 
INVALID_PROCESS_LAUNCHING_TASKCONTROLLER in error scenarios. Can't we do 
something here?

taskcontroller.c
 - Documentation of kill_user_task() should be fixed. Currently, it only refers 
to SIGTERM.
 - The get_pid_path() function can be removed. Also references to it from code 
documentation of run_task_as_user, kill_user_task can be removed too along with 
the TT_SYS_DIR var from taskcontroller.h
 - +343 "//Don't continue if the process is not alive anymore." should instead 
be "  // Don't continue if the process-group is not alive anymore."

TestKillSubProcessesWithTaskController
 - Rename to TestKillSubProcessesWithLinuxTaskController
 - Remove unused imports
 - Add Apache license header and add javadoc similar to other tests with 
LinuxTaskController
 - Just like in HADOOP-5771, we may want to verify job-ownership in this test 
too.
 - The test ran successfully, but it is running the jobs as the same user as 
the user running the cluster. Need to fix this.
 - Also, I see a lot of messages like this : "WARN  mapred.LinuxTaskController 
(LinuxTaskController.java:destroyTask(454)) - Exception thrown while sending 
destroy to the Task VM org.apache.hadoop.util.Shell$ExitCodeException:". Need 
to debug the cause for these.

We also need a test to verify cleanup of tasks that ignore SIGTERM.


> Support killing of process groups in LinuxTaskController binary
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5420
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5420
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: hadoop-5420-1.patch, hadoop-5420-2.patch, 
> hadoop-5420-3.patch, hadoop-5420.patch
>
>
> Support setsid based kill in LinuxTaskController.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to