[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM & NM

Thomas Graves (Commented) (JIRA) Tue, 15 Nov 2011 13:50:18 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150808#comment-13150808
 ]


Thomas Graves commented on MAPREDUCE-2863:
------------------------------------------

{quote}
amoutput:

    (JSON output Not sure if this is really an issue but "job" and "task" in 
one set point to an array of objects and in another point to an object ( /jobs 
vs /job and likewise for /tasks and /task ). I believe this is the case for 
most multi-obj vs single-obj get calls.

{quote}

Right, one is an array of task objects the other is just a single task object. 
In the json case its telling you the type of object in the array.  I've seen 
that used in other places for rest output so followed the convention - and that 
is what jaxb gives me.  I don't see how else to do it if you want the xml and 
the json to match since the json doesn't create an array like [ task: {}, task: 
{}].  If I change the name of the json array to says tasksarray, then each 
object in the xml output then becomes tasksarray.    If you/someone has ideas 
or really don't like that let me know.


{quote}
rmoutput:

    do we need to support some form of pagination of the responses?
    we can probably expect hundreds of nodes, apps - when making an /apps call, 
do we intend to dump out all the data? For now, as this is the first version, I 
believe we can look at addressing this issue in a later version. Probably 
follow the same approach as the web ui as done in prev versions when handling 
large amounts of info.
{quote}
The only thing currently implemented is the query parameters.  Some of the 
limit query parameter and others by start/finish time. I figured we could 
expand upon it later.  The issue I see here is that on a busy cluster things 
could potentially change so fast that unless you have something else to go by 
(start/finish time) just going by page or number won't give you consistent 
results. Perhaps for nodes that isn't quite as bad.

{quote}
Any ideas on how the folks in SE/operations would plan to use these apis? It 
might be worth discussing the output formats with them to understand if the 
main use cases can be met without incurring a lot of calls/overhead. If in 
general, the use case is to get detailed info on all apps, this will result in 
a lot more calls to the RM which will be a performance overhead. In addition to 
the ids, it might make sense to return some minimal additional/useful info in 
the full listing such that detailed info calls are not required.
{quote}

Good point and this is why I have went back a forth a bit. I thought the 
original with outputting all the details might be a a lot of possibly unneeded 
data, but then as you say going with just the list of ids might require a lot 
more queries. I should try to load test it. I would hope people are at least 
using keep alive on there connnection if they are turning around doing more 
queries. I have talked to various people and it seems different people want 
different things.  Certain ops folks definitely need a subset of the job 
details so they would turn around and query each of the job ids that would be 
returned.  We could support both perhaps via a query parameter.  Say something 
like:

/apps -> by default give you list of appids [appid1, appid2, appid3]
/apps?attrs=* -> give you the full list of app details [ {id: 1, state: 
started...}, {id: 2, state: started...}, ...]
/apps?attrs=id,state,queue -> give you the subset of fields requested. This 
could be implemented in the future and for now just support *.

The downside on that would be that the output format would change based on the 
query param. 

If we didn't want to change output type we could only return the id by default 
like [{id: 1}, {id,2},..], it just wouldn't be quite as friendly to iterator 
over.
 
thoughts on that?


                
> Support web-services for RM & NM
> --------------------------------
>
>                 Key: MAPREDUCE-2863
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2, nodemanager, resourcemanager
>    Affects Versions: 0.23.0
>            Reporter: Arun C Murthy
>            Assignee: Thomas Graves
>            Priority: Blocker
>         Attachments: MAPREDUCE-2863.patch, amoutput.txt, nmoutput.txt, 
> nmoutput.txt, rmoutput.txt, rmoutput.txt
>
>
> It will be very useful for RM and NM to support web-services to export 
> json/xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM & NM

Reply via email to