[ https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod K V updated MAPREDUCE-927: -------------------------------- Status: Open (was: Patch Available) Looked at the patch. Have some comments: TaskLog.java - Creation of the {{localFS}} should not be in a static block. In the past also, we did this and realized it creates a kind of circular initialization of loggers and results in NPE while creating log objects which can be seen in task-logs. The current way of creation of {{localFS}} should be retained. - We can move {{DEFAULT_USER_LOG_RETAIN_HOURS}} to {{TaskLogsCleanupThread}}. - Shall we rename {{getJobUserLogDir() to a simpler {{getJobDir()}}? And {{getBaseDir(String)}} to {{getAttemptDir(String)}} to be clear? I think it's ok like this as {{TaskLog.getJobDir()}} clearly means it is job-dir for logs. - Also, I think it's high time {{TaskLog.java}} is made @InterfaceAudience.Private. TaskLogCleanupThread - Rename the class to {{TaskLogsMonitor}}, so that we are consistent going forward with MAPREDUCE-1100. - Set the audience visibility of the class to private? - {{threadSleepTime}} is not configurable. May not be a public documented configuration, but still we need one. - Constructor: Shouldn't the volume on which the disk-service works be {{getUserLogsDir()}} instead of {{getBaseLogDir()}}? The correctness is not lost with the current patch as we are always passing absolute paths to the disk-service, but I think we should change it anyways. Also can't we simply construct a local-filesystem here itself, instead of calling TaskLog.getLocalFileSystem()? This will mostly avoid your changes regarding my first comment in TaskLog.java above. - For the sake of correctness, in {{removeOldUserLogs()}}, the job should be removed only after the deletion of the log file. - Throughout the class, we should use a clock instead of directly using {{System.currentMillis()}}. This will better the testing. - Shouldn't {{clearOldUserLogs()}} be done as part of constructor itself? This is the pattern that {{MRAsyncDiskService}} uses, for example. TaskTracker.java - The {{logscleanup}} thread is not joined/killed in the {{TaskTracker.close()}}. So, there will be zombie threads in the system on a re-init and may well interfere with the new thread. - Shouldn't we be setting 770-user-mapred permissions on userlogs/$jobid as part of job-localization? Granted 711 is enough for now, but this slightly deviates from the current security on the directories - we always have the most secure permissions possible on the files/dirs. If we do this, TestLocalizationWithLinuxTaskController should also test the permissions of the joblogdir. - The following code fragment at +1035 can be moved out of {{localizeJobFiles()}} into a new method {{initializeJobLogDir()}} and can be called directly from {{localizeJob()}}. {code} taskLogCleanupThread.removeJobFromLogDeletion(jobId); localizer.initializeJobUserLogDir(jobId); {code} - One case is still not handled: TT reinits while the Job is still running. After re-init, no tasks of the running job arrive at this TT. Retain-hours after re-init TT removes the job's tasks' logs even though the job is still running elsewhere. Doing this will need TT to specially communicate with JT and from what I understand, we are not doing it here. If that is the case, can we simply add a test for this too in {{TestUserLogCleanup}}? TestUserLogCleanup - Use {{TaskTracker.initializeJobLogDir()}} mentioned above in {{setupJobLogDirectory()}}. - Once we use a clock in TaskLogCleanupThread, -- the tests can be modified to use this. -- we can test what exactly the modified retainTimeStamp is after re-init. - We should create some attempt dirs also in the joblog dir with appropriate permissions and verify the proper cleanup. - Document(javadoc) the two tests? test-taskcontroller.c - This needs to be fixed to reflect the new directory hierarchy of the logs. mapred_tutorial.xml - Should TaskLogs section be changed to explicitly specify the new directory hierarchy? Cancelling the patch for the sake of these changes. > Cleanup of task-logs should happen in TaskTracker instead of the Child > ---------------------------------------------------------------------- > > Key: MAPREDUCE-927 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-927 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: security, tasktracker > Affects Versions: 0.21.0 > Reporter: Vinod K V > Assignee: Amareshwari Sriramadasu > Priority: Blocker > Fix For: 0.21.0 > > Attachments: patch-927-1.txt, patch-927.txt > > > Task logs' cleanup is being done in Child now. This is undesirable atleast > for two reasons: 1) failures while cleaning up will affect the user's tasks, > and 2) the task's wall time will get affected due to operations that TT > actually should own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.