Ujjawal Kumar created MAPREDUCE-7410:
----------------------------------------

             Summary: Expose API to get task ids and individual task report 
given task Id from org.apache.hadoop.mapreduce.Job
                 Key: MAPREDUCE-7410
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7410
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: jobhistoryserver, yarn
            Reporter: Ujjawal Kumar
         Attachments: Screenshot 2022-10-06 at 4.46.48 PM.png

Currently org.apache.hadoop.mapreduce.Job exposes getTaskReports(TaskType) API 
to fetch task reports of either mapper or reducer. However for MR jobs with 
large number of tasks this causes OOM issues while fetching all task reports as 
seen with JHS (HistoryClientService.getTaskReports), HistoryClientService also 
exposes an API getTaskReport() where a TaskId can be provided within the 
GetTaskReportRequest. org.apache.hadoop.mapreduce.Job can expose 2 API so that 
individual task report can be fetched after listing them from client side
 # Job.getTasks(TaskType) -> List<TaskId> - This would return TaskId of all 
tasks with given Type to the client
 # Job.getTaskReport(TaskId) -> TaskReport - This would return task report for 
single task to the client

For JHS since JobHistoryParser.parse already parses full history file by 
default and maintains the list of tasks within JobHistoryParser.JobInfo's 
tasksMap, this info should be easy to get

One additional thing that needs to be seen is if this can be supported for 
requests which are redirected to MRClientService (within MRAppMaster) for 
running jobs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to