[
https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oleksandr Shevchenko reassigned MAPREDUCE-7133:
-----------------------------------------------
Assignee: Oleksandr Shevchenko
> History Server task attempts REST API returns invalid data
> ----------------------------------------------------------
>
> Key: MAPREDUCE-7133
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver
> Reporter: Oleksandr Shevchenko
> Assignee: Oleksandr Shevchenko
> Priority: Major
> Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch,
> MAPREDUCE-7133.003.patch, MAPREDUCE-7133.004.patch
>
>
> When we send a request to History Server with headers : Accept:
> application/json
> [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts]
>
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_000003_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_000006",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> <taskAttempts>
> <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]"
> xsi:type="reduceTaskAttemptInfo"><startTime>1535372984638</startTime><finishTime>1535372986149</finishTime><elapsedTime>1511</elapsedTime><progress>100.0</progress><id>attempt_1535363926925_0040_r_000003_0</id><rack>/default-rack</rack><state>SUCCEEDED</state><status>reduce
> >
> reduce</status><nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress><diagnostics/><type>REDUCE</type><assignedContainerId>container_e01_1535363926925_0040_01_000006</assignedContainerId><shuffleFinishTime>1535372986056</shuffleFinishTime><mergeFinishTime>1535372986075</mergeFinishTime><elapsedShuffleTime>1418</elapsedShuffleTime><elapsedMergeTime>19</elapsedMergeTime><elapsedReduceTime>74</elapsedReduceTime></taskAttempt>
> </taskAttempts>
> {code}
> Take a look at the following string:
> {code:java}
> <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]"
> xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated
> field if we use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type"
> attribute.
> {code:java}
> <taskAttempts>
> <taskAttempt>
> <startTime>1535370756528</startTime>
> <finishTime>1535370760318</finishTime>
> <elapsedTime>3790</elapsedTime>
> <progress>100.0</progress>
> <id>attempt_1535363926925_0029_m_000001_0</id>
> <rack>/default-rack</rack>
> <state>SUCCEEDED</state>
> <status>map > sort</status>
> <nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress>
> <diagnostics/>
> <type>MAP</type>
> <assignedContainerId>container_e01_1535363926925_0029_01_000003</assignedContainerId>
> </taskAttempt>
> </taskAttempts>
> {code}
> This happens since we have two different hierarchical classes for MAP
> ->TaskAttemptInfo and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks
> (map and reduce) by TaskAttemptsInfo.getTaskAttempt(). In this place, we do
> not have any information about ReduceTaskAttemptInfo type as we store all
> tasks in ArrayList<TaskAttemptInfo>.
> During marshaling we see that actual type of task ReduceTaskAttemptInfo
> instead of TaskAttemptsInfo and add meta information for this. That's why we
> get duplicated fields.
> Unfortunately we do not catch it before in TestHsWebServicesAttempts since we
> use
> org.codehaus.jettison.json.JSONObject library which overrides duplicated
> fields. Even when we use Postman to do request we get valid JSON. Only when
> we change represent type to Raw we can notice this issue. Also, we able to
> reproduce this bug by using "org.json:json" lib:
> Something like this:
> {code:java}
> BufferedReader inReader = new BufferedReader( new
> InputStreamReader(connection.getInputStream() ) );
> String inputLine;
> StringBuilder response = new StringBuilder();
> while ( (inputLine = inReader.readLine()) != null ) {
> response.append(inputLine);
> }
> inReader.close();
> JSONObject o = new JSONObject(response.toString());
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]