[ 
https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-927:
--------------------------------

    Status: Open  (was: Patch Available)

Looked at the patch. Have some comments:

TaskLog.java
 - Creation of the {{localFS}} should not be in a static block. In the past 
also, we did this and realized it creates a kind of circular initialization of 
loggers and results in NPE while creating log objects which can be seen in 
task-logs. The current way of creation of {{localFS}} should be retained.
 - We can move {{DEFAULT_USER_LOG_RETAIN_HOURS}} to {{TaskLogsCleanupThread}}.
 - Shall we rename {{getJobUserLogDir() to a simpler {{getJobDir()}}? And 
{{getBaseDir(String)}} to {{getAttemptDir(String)}} to be clear? I think it's 
ok like this as {{TaskLog.getJobDir()}} clearly means it is job-dir for logs.
 - Also, I think it's high time {{TaskLog.java}} is made 
@InterfaceAudience.Private.

TaskLogCleanupThread
 - Rename the class to {{TaskLogsMonitor}}, so that we are consistent going 
forward with MAPREDUCE-1100.
 - Set the audience visibility of the class to private?
 - {{threadSleepTime}} is not configurable. May not be a public documented 
configuration, but still we need one.
 - Constructor: Shouldn't the volume on which the disk-service works be 
{{getUserLogsDir()}} instead of {{getBaseLogDir()}}? The correctness is not 
lost with the current patch as we are always passing absolute paths to the 
disk-service, but I think we should change it anyways. Also can't we simply 
construct a local-filesystem here itself, instead of calling 
TaskLog.getLocalFileSystem()? This will mostly avoid your changes regarding my 
first comment in TaskLog.java above.
 - For the sake of correctness, in {{removeOldUserLogs()}}, the job should be 
removed only after the deletion of the log file.
 - Throughout the class, we should use a clock instead of directly using 
{{System.currentMillis()}}. This will better the testing.
 - Shouldn't {{clearOldUserLogs()}} be done as part of constructor itself? This 
is the pattern that {{MRAsyncDiskService}} uses, for example.

TaskTracker.java
 - The {{logscleanup}} thread is not joined/killed in the 
{{TaskTracker.close()}}. So, there will be zombie threads in the system on a 
re-init and may well interfere with the new thread.
 - Shouldn't we be setting 770-user-mapred permissions on userlogs/$jobid as 
part of job-localization? Granted 711 is enough for now, but this slightly 
deviates from the current security on the directories - we always have the most 
secure permissions possible on the files/dirs. If we do this, 
TestLocalizationWithLinuxTaskController should also test the permissions of the 
joblogdir.
 - The following code fragment at +1035 can be moved out of 
{{localizeJobFiles()}} into a new method {{initializeJobLogDir()}} and can be 
called directly from {{localizeJob()}}.
{code}
  taskLogCleanupThread.removeJobFromLogDeletion(jobId);
  localizer.initializeJobUserLogDir(jobId);
{code}
 - One case is still not handled: TT reinits while the Job is still running. 
After re-init, no tasks of the running job arrive at this TT. Retain-hours 
after re-init TT removes the job's tasks' logs even though the job is still 
running elsewhere. Doing this will need TT to specially communicate with JT and 
from what I understand, we are not doing it here. If that is the case, can we 
simply add a test for this too in {{TestUserLogCleanup}}?

TestUserLogCleanup
 - Use {{TaskTracker.initializeJobLogDir()}} mentioned above in 
{{setupJobLogDirectory()}}.
 - Once we use a clock in TaskLogCleanupThread,
    -- the tests can be modified to use this.
    -- we can test what exactly the modified retainTimeStamp is after re-init.
 - We should create some attempt dirs also in the joblog dir with appropriate 
permissions and verify the proper cleanup.
 - Document(javadoc) the two tests?

test-taskcontroller.c
 - This needs to be fixed to reflect the new directory hierarchy of the logs.

mapred_tutorial.xml
 - Should TaskLogs section be changed to explicitly specify the new directory 
hierarchy?

Cancelling the patch for the sake of these changes.

> Cleanup of task-logs should happen in TaskTracker instead of the Child
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-927
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-927
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security, tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: patch-927-1.txt, patch-927.txt
>
>
> Task logs' cleanup is being done in Child now. This is undesirable atleast 
> for two reasons: 1) failures while cleaning up will affect the user's tasks, 
> and 2) the task's wall time will get affected due to operations that TT 
> actually should own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to