[ https://issues.apache.org/jira/browse/HADOOP-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hemanth Yamijala updated HADOOP-5883: ------------------------------------- Attachment: HADOOP-5883.patch The attached patch file incorporates the changes as mentioned in the earlier comment. The key change was to determine processes older than a certain time. To do this, the process tree class keeps track of an 'age' of the process - which is how many time the process tree has seen a process with this PID. This count is updated every time the process tree is refreshed - which is once every monitoring iteration. The monitoring thread can now ask for cumulative virtual memory of processes over a certain 'age'. For the sake of simplicity, I've assumed the monitoring interval determines how aged processes are. It is possible to do something more sophisticated - for e.g. we could determine the walltime of the process by making a system call. There doesn't seem to be a direct API for getting the 'walltime' of a process. One hack would be to see the created time of the pid directory in /proc and then subtract it from timeofday each time. However, it seems like this could be a costly operation, while not giving way too much more accuracy. Summary of the changes: - TaskMemoryManagerThread: changes to the logic to determine if a task is over limit. - ProcfsBasedProcessTree: introduces age for processes and updates them - The rest of the changes were a lot to do with enabling fast unit tests to be written. I think it is a good idea to move many of the tests for these two classes to use the testing mechanism I'm using. But that's the focus of another JIRA. - Introduced new tests for testing just these changes. I suppose this patch will need merging with HADOOP-5881, which is likely to go in first. I will update the patch with those changes once it's ready. This is just put up for early consumption. Also missing is documentation updates for the new semantics of monitoring. Again, I will finish that after the HADOOP-5881 merge. > TaskMemoryMonitorThread might shoot down tasks even if their processes > momentarily exceed the requested memory > -------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-5883 > URL: https://issues.apache.org/jira/browse/HADOOP-5883 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Hemanth Yamijala > Attachments: HADOOP-5883.patch > > > Currently the TaskMemoryMonitorThread kills tasks as soon as it detects they > are consuming more memory than the max value specified. There are valid cases > (see HADOOP-5059) where if a program is executed from the task, it might > momentarily occupy twice the amount of memory for a short time. Ideally the > monitoring thread should handle this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.