[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798091#comment-13798091
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5577:
----------------------------------------------------

Okay, that gives me a little more detail, but I still don't get it fully.

Let's say the client is looking for finished jobs and the window it uses it an 
hour. It first asks JHS to give jobs that finished from 3PM to 4PM. JHS returns 
three jobs that finished at 3:01, 3:03, 3:15. Now all client needs to do is to 
move the window already and ask for jobs that finished between 3:15PM and 
4:15PM, no?

One significant point is that finish-time is set by AMs. But every time the 
getJobs() API is called, JHS scans the intermediate done directory and 
populates its cache. So, when a client asks for finished-jobs beween 3:PM-4:PM, 
it is guaranteed to get any finished jobs in that duration. It seems like you 
are hinting that this contract is broken. Is it?

Also in the patch, I don't see where aBegin and aEnd are getting used. In 
CachedHistoryStorage, aBegin and aEnd are validated but never used.

It does look like I'm missing something, please bear with me. I'm just making 
sure we are doing the right thing. The same API if needed here can be added to 
Application History Server which I am watching over. Tx.

> Allow querying the JobHistoryServer by job arrival time
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-5577
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-5577.patch
>
>
>   The JobHistoryServer REST APIs currently allow querying by job submit time 
> and finish time.  However, jobs don't necessarily arrive in order of their 
> finish time, meaning that a client who wants to stay on top of all completed 
> jobs needs to query large time intervals to make sure they're not missing 
> anything.  Exposing functionality to allow querying by the time a job lands 
> at the JobHistoryServer would allow clients to set the start of their query 
> interval to the time of their last query. 
> The arrival time of a job would be defined as the time that it lands in the 
> done directory and can be picked up using the last modified date on history 
> files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to