[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796228#comment-13796228
 ] 

Sandy Ryza commented on MAPREDUCE-5577:
---------------------------------------

The goal is to make things easier for clients that are trying to track all jobs 
that go through the JHS.  Without this, they must always query the largest 
interval that a job could conceivably come in after its finish time (which 
could be minutes with things like GC pauses).  This means a lot of redundant 
job data transferred and more work for the client, as it must keep track of all 
the jobs it's received in that time interval to filter out what's new.


> Allow querying the JobHistoryServer by job arrival time
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-5577
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-5577.patch
>
>
> The JobHistoryServer REST APIs currently allow querying by job submit time 
> and finish time.  However, jobs don't necessarily arrive in order of their 
> finish time, meaning that a client who wants to stay on top of all completed 
> jobs needs to query large time intervals to make sure they're not missing 
> anything.  Exposing functionality to allow querying by the time a job lands 
> at the JobHistoryServer would allow clients to set the start of their query 
> interval to the time of their last query. 
> The arrival time of a job would be defined as the time that it lands in the 
> done directory and can be picked up using the last modified date on history 
> files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to