[
https://issues.apache.org/jira/browse/HADOOP-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730220#action_12730220
]
Hemanth Yamijala commented on HADOOP-4491:
------------------------------------------
Some other comments on the code:
- The code models some of the commands passed to the taskcontroller as 'first
task' for JVM or 'not first task' for JVM. This complicates the contract
between the TT and the task-controller. I think at a minimum these decisions
should be restricted only to the JVM manager, and the contract between the TT
and task-controller should be handled by modelling new commands like
FINALIZE_JVM and FINALIZE_TASK.
- The patch introduces a log directory argument to the task-controller. I think
this is not required as there is only one value for the hadoop.log.dir for all
tasks.
- Code related to creation of work dirs is removed from TaskRunner. Where is it
created now ?
- Permissions for job directory is changed multiple times - once per each task.
- finalizeTaskDirs (or parts of it) needs to be synchronized.
- In the case of jvm reuse, we still need to make the output available. Because
it seems the task completion event is sent as soon as the task is done not
after jvm exits. This is only a theoretical case possibly, but it still will be
good to keep the code paths identical.
- Path permissions should be taken care of ? so, if mapred.local.dir doesn't
have execute permissions for others, we need to set them.
- In setup, we seem to be setting permissions even if directory creation fails.
also shouldn't we set permissions even if it exists.. so that it is right as
per our requirement.
- Didn't understand the purpose of initStatus. Since it starts out being true,
wouldn't it always remain true ?
- Cache directory probably doesn't need 777 because it is not written to by the
tasks. We can probably retain this if HADOOP-4490 set like permissions, since
this will be addressed in HADOOP-4493.
- getBaseIntermediateOutputDir seems an overkill if it is just returning a
constant.
- Changes in SpillRecord seem to be unnecessary.
Some nits:
- There are some else blocks without code, but with a code comment explaining
why there's no else block. While useful, it is somewhat unconventional. I would
recommend the reason be moved to a comment starting the if block itself.
- TODOs in the patch must be discussed and resolved.
- TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere.
- Rename finalizeTaskDirs as finalizeTask - it matches with the naming
convention for the other apis in taskcontroller.
- Add a comment about why task-work is 755.
- mapred.child.local.dir has an extra comma at the end.
> Per-job local data on the TaskTracker node should have right access-control
> ---------------------------------------------------------------------------
>
> Key: HADOOP-4491
> URL: https://issues.apache.org/jira/browse/HADOOP-4491
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: security
> Reporter: Arun C Murthy
> Assignee: Vinod K V
> Attachments: HADOOP-4491-20090623-common.1.txt,
> HADOOP-4491-20090623-mapred.1.txt, HADOOP-4491-20090703-common.1.txt,
> HADOOP-4491-20090703-common.txt, HADOOP-4491-20090703.1.txt,
> HADOOP-4491-20090703.txt, HADOOP-4491-20090707-common.txt,
> HADOOP-4491-20090707.txt
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.