Ujjawal Kumar created MAPREDUCE-7410:
----------------------------------------
Summary: Expose API to get task ids and individual task report
given task Id from org.apache.hadoop.mapreduce.Job
Key: MAPREDUCE-7410
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7410
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: jobhistoryserver, yarn
Reporter: Ujjawal Kumar
Attachments: Screenshot 2022-10-06 at 4.46.48 PM.png
Currently org.apache.hadoop.mapreduce.Job exposes getTaskReports(TaskType) API
to fetch task reports of either mapper or reducer. However for MR jobs with
large number of tasks this causes OOM issues while fetching all task reports as
seen with JHS (HistoryClientService.getTaskReports), HistoryClientService also
exposes an API getTaskReport() where a TaskId can be provided within the
GetTaskReportRequest. org.apache.hadoop.mapreduce.Job can expose 2 API so that
individual task report can be fetched after listing them from client side
# Job.getTasks(TaskType) -> List<TaskId> - This would return TaskId of all
tasks with given Type to the client
# Job.getTaskReport(TaskId) -> TaskReport - This would return task report for
single task to the client
For JHS since JobHistoryParser.parse already parses full history file by
default and maintains the list of tasks within JobHistoryParser.JobInfo's
tasksMap, this info should be easy to get
One additional thing that needs to be seen is if this can be supported for
requests which are redirected to MRClientService (within MRAppMaster) for
running jobs
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]