[ https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613238#comment-16613238 ]
Oleksandr Shevchenko commented on MAPREDUCE-7133: ------------------------------------------------- Thank you [~jlowe] for the clarification. Good point about backporting into previous versions. Attached new patch. Created the separated ticket for refactoring MAPREDUCE-7140 (I will attach the patch for this after merging MAPREDUCE-7133). > History Server task attempts REST API returns invalid data > ---------------------------------------------------------- > > Key: MAPREDUCE-7133 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver > Reporter: Oleksandr Shevchenko > Assignee: Oleksandr Shevchenko > Priority: Major > Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, > MAPREDUCE-7133.003.patch, MAPREDUCE-7133.004.patch, MAPREDUCE-7133.005.patch > > > When we send a request to History Server with headers : Accept: > application/json > [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_000003/attempts] > > we get the following JSON: > {code:java} > { > "taskAttempts": { > "taskAttempt": [{ > "type": "reduceTaskAttemptInfo", > "startTime": 1535372984638, > "finishTime": 1535372986149, > "elapsedTime": 1511, > "progress": 100.0, > "id": "attempt_1535363926925_0040_r_000003_0", > "rack": "/default-rack", > "state": "SUCCEEDED", > "status": "reduce > reduce", > "nodeHttpAddress": "node2.cluster.com:8044", > "diagnostics": "", > "type": "REDUCE", > "assignedContainerId": "container_e01_1535363926925_0040_01_000006", > "shuffleFinishTime": 1535372986056, > "mergeFinishTime": 1535372986075, > "elapsedShuffleTime": 1418, > "elapsedMergeTime": 19, > "elapsedReduceTime": 74 > }] > } > } > {code} > As you can see "type" property has duplicates: > "type": "reduceTaskAttemptInfo" > "type": "REDUCE" > It's lead to an error during parsing response body as JSON is not valid. > When we use application/xml we get the following response: > {code:java} > <taskAttempts> > <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" > xsi:type="reduceTaskAttemptInfo"><startTime>1535372984638</startTime><finishTime>1535372986149</finishTime><elapsedTime>1511</elapsedTime><progress>100.0</progress><id>attempt_1535363926925_0040_r_000003_0</id><rack>/default-rack</rack><state>SUCCEEDED</state><status>reduce > > > reduce</status><nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress><diagnostics/><type>REDUCE</type><assignedContainerId>container_e01_1535363926925_0040_01_000006</assignedContainerId><shuffleFinishTime>1535372986056</shuffleFinishTime><mergeFinishTime>1535372986075</mergeFinishTime><elapsedShuffleTime>1418</elapsedShuffleTime><elapsedMergeTime>19</elapsedMergeTime><elapsedReduceTime>74</elapsedReduceTime></taskAttempt> > </taskAttempts> > {code} > Take a look at the following string: > {code:java} > <taskAttempt xmlns:xsi="[http://www.w3.org/2001/XMLSchema-instance]" > xsi:type="reduceTaskAttemptInfo"> > {code} > We got "xsi:type" attribute which incorectly marshall later to duplicated > field if we use JSON format. > It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" > attribute. > {code:java} > <taskAttempts> > <taskAttempt> > <startTime>1535370756528</startTime> > <finishTime>1535370760318</finishTime> > <elapsedTime>3790</elapsedTime> > <progress>100.0</progress> > <id>attempt_1535363926925_0029_m_000001_0</id> > <rack>/default-rack</rack> > <state>SUCCEEDED</state> > <status>map > sort</status> > <nodeHttpAddress>[node2.cluster.com:8044|http://node2.cluster.com:8044]</nodeHttpAddress> > <diagnostics/> > <type>MAP</type> > <assignedContainerId>container_e01_1535363926925_0029_01_000003</assignedContainerId> > </taskAttempt> > </taskAttempts> > {code} > This happens since we have two different hierarchical classes for MAP > ->TaskAttemptInfo and REDUCE- > ReduceTaskAttemptInfo tasks. > ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks > (map and reduce) by TaskAttemptsInfo.getTaskAttempt(). In this place, we do > not have any information about ReduceTaskAttemptInfo type as we store all > tasks in ArrayList<TaskAttemptInfo>. > During marshaling we see that actual type of task ReduceTaskAttemptInfo > instead of TaskAttemptsInfo and add meta information for this. That's why we > get duplicated fields. > Unfortunately we do not catch it before in TestHsWebServicesAttempts since we > use > org.codehaus.jettison.json.JSONObject library which overrides duplicated > fields. Even when we use Postman to do request we get valid JSON. Only when > we change represent type to Raw we can notice this issue. Also, we able to > reproduce this bug by using "org.json:json" lib: > Something like this: > {code:java} > BufferedReader inReader = new BufferedReader( new > InputStreamReader(connection.getInputStream() ) ); > String inputLine; > StringBuilder response = new StringBuilder(); > while ( (inputLine = inReader.readLine()) != null ) { > response.append(inputLine); > } > inReader.close(); > JSONObject o = new JSONObject(response.toString()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org