[ 
https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730220#action_12730220
 ] 

Hemanth Yamijala commented on HADOOP-4491:
------------------------------------------

Some other comments on the code:

- The code models some of the commands passed to the taskcontroller as 'first 
task' for JVM or 'not first task' for JVM. This complicates the contract 
between the TT and the task-controller. I think at a minimum these decisions 
should be restricted only to the JVM manager, and the contract between the TT 
and task-controller should be handled by modelling new commands like 
FINALIZE_JVM and FINALIZE_TASK.
- The patch introduces a log directory argument to the task-controller. I think 
this is not required as there is only one value for the hadoop.log.dir for all 
tasks.
- Code related to creation of work dirs is removed from TaskRunner. Where is it 
created now ?
- Permissions for job directory is changed multiple times - once per each task.
- finalizeTaskDirs (or parts of it) needs to be synchronized.
- In the case of jvm reuse, we still need to make the output available. Because 
it seems the task completion event is sent as soon as the task is done not 
after jvm exits. This is only a theoretical case possibly, but it still will be 
good to keep the code paths identical.
- Path permissions should be taken care of ? so, if mapred.local.dir doesn't 
have execute permissions for others, we need to set them.
- In setup, we seem to be setting permissions even if directory creation fails. 
also shouldn't we set permissions even if it exists.. so that it is right as 
per our requirement.
- Didn't understand the purpose of initStatus. Since it starts out being true, 
wouldn't it always remain true ?
- Cache directory probably doesn't need 777 because it is not written to by the 
tasks. We can probably retain this if HADOOP-4490 set like permissions, since 
this will be addressed in HADOOP-4493.
- getBaseIntermediateOutputDir seems an overkill if it is just returning a 
constant.
- Changes in SpillRecord seem to be unnecessary.

Some nits:
- There are some else blocks without code, but with a code comment explaining 
why there's no else block. While useful, it is somewhat unconventional. I would 
recommend the reason be moved to a comment starting the if block itself.
- TODOs in the patch must be discussed and resolved.
- TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere.
- Rename finalizeTaskDirs as finalizeTask - it matches with the naming 
convention for the other apis in taskcontroller.
- Add a comment about why task-work is 755.
- mapred.child.local.dir has an extra comma at the end.

> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-4491
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4491
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Arun C Murthy
>            Assignee: Vinod K V
>         Attachments: HADOOP-4491-20090623-common.1.txt, 
> HADOOP-4491-20090623-mapred.1.txt, HADOOP-4491-20090703-common.1.txt, 
> HADOOP-4491-20090703-common.txt, HADOOP-4491-20090703.1.txt, 
> HADOOP-4491-20090703.txt, HADOOP-4491-20090707-common.txt, 
> HADOOP-4491-20090707.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to