[ https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150808#comment-13150808 ]
Thomas Graves commented on MAPREDUCE-2863: ------------------------------------------ {quote} amoutput: (JSON output Not sure if this is really an issue but "job" and "task" in one set point to an array of objects and in another point to an object ( /jobs vs /job and likewise for /tasks and /task ). I believe this is the case for most multi-obj vs single-obj get calls. {quote} Right, one is an array of task objects the other is just a single task object. In the json case its telling you the type of object in the array. I've seen that used in other places for rest output so followed the convention - and that is what jaxb gives me. I don't see how else to do it if you want the xml and the json to match since the json doesn't create an array like [ task: {}, task: {}]. If I change the name of the json array to says tasksarray, then each object in the xml output then becomes tasksarray. If you/someone has ideas or really don't like that let me know. {quote} rmoutput: do we need to support some form of pagination of the responses? we can probably expect hundreds of nodes, apps - when making an /apps call, do we intend to dump out all the data? For now, as this is the first version, I believe we can look at addressing this issue in a later version. Probably follow the same approach as the web ui as done in prev versions when handling large amounts of info. {quote} The only thing currently implemented is the query parameters. Some of the limit query parameter and others by start/finish time. I figured we could expand upon it later. The issue I see here is that on a busy cluster things could potentially change so fast that unless you have something else to go by (start/finish time) just going by page or number won't give you consistent results. Perhaps for nodes that isn't quite as bad. {quote} Any ideas on how the folks in SE/operations would plan to use these apis? It might be worth discussing the output formats with them to understand if the main use cases can be met without incurring a lot of calls/overhead. If in general, the use case is to get detailed info on all apps, this will result in a lot more calls to the RM which will be a performance overhead. In addition to the ids, it might make sense to return some minimal additional/useful info in the full listing such that detailed info calls are not required. {quote} Good point and this is why I have went back a forth a bit. I thought the original with outputting all the details might be a a lot of possibly unneeded data, but then as you say going with just the list of ids might require a lot more queries. I should try to load test it. I would hope people are at least using keep alive on there connnection if they are turning around doing more queries. I have talked to various people and it seems different people want different things. Certain ops folks definitely need a subset of the job details so they would turn around and query each of the job ids that would be returned. We could support both perhaps via a query parameter. Say something like: /apps -> by default give you list of appids [appid1, appid2, appid3] /apps?attrs=* -> give you the full list of app details [ {id: 1, state: started...}, {id: 2, state: started...}, ...] /apps?attrs=id,state,queue -> give you the subset of fields requested. This could be implemented in the future and for now just support *. The downside on that would be that the output format would change based on the query param. If we didn't want to change output type we could only return the id by default like [{id: 1}, {id,2},..], it just wouldn't be quite as friendly to iterator over. thoughts on that? > Support web-services for RM & NM > -------------------------------- > > Key: MAPREDUCE-2863 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2, nodemanager, resourcemanager > Affects Versions: 0.23.0 > Reporter: Arun C Murthy > Assignee: Thomas Graves > Priority: Blocker > Attachments: MAPREDUCE-2863.patch, amoutput.txt, nmoutput.txt, > nmoutput.txt, rmoutput.txt, rmoutput.txt > > > It will be very useful for RM and NM to support web-services to export > json/xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira