[ https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798091#comment-13798091 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-5577: ---------------------------------------------------- Okay, that gives me a little more detail, but I still don't get it fully. Let's say the client is looking for finished jobs and the window it uses it an hour. It first asks JHS to give jobs that finished from 3PM to 4PM. JHS returns three jobs that finished at 3:01, 3:03, 3:15. Now all client needs to do is to move the window already and ask for jobs that finished between 3:15PM and 4:15PM, no? One significant point is that finish-time is set by AMs. But every time the getJobs() API is called, JHS scans the intermediate done directory and populates its cache. So, when a client asks for finished-jobs beween 3:PM-4:PM, it is guaranteed to get any finished jobs in that duration. It seems like you are hinting that this contract is broken. Is it? Also in the patch, I don't see where aBegin and aEnd are getting used. In CachedHistoryStorage, aBegin and aEnd are validated but never used. It does look like I'm missing something, please bear with me. I'm just making sure we are doing the right thing. The same API if needed here can be added to Application History Server which I am watching over. Tx. > Allow querying the JobHistoryServer by job arrival time > ------------------------------------------------------- > > Key: MAPREDUCE-5577 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver > Reporter: Sandy Ryza > Assignee: Sandy Ryza > Attachments: MAPREDUCE-5577.patch > > > The JobHistoryServer REST APIs currently allow querying by job submit time > and finish time. However, jobs don't necessarily arrive in order of their > finish time, meaning that a client who wants to stay on top of all completed > jobs needs to query large time intervals to make sure they're not missing > anything. Exposing functionality to allow querying by the time a job lands > at the JobHistoryServer would allow clients to set the start of their query > interval to the time of their last query. > The arrival time of a job would be defined as the time that it lands in the > done directory and can be picked up using the last modified date on history > files. -- This message was sent by Atlassian JIRA (v6.1#6144)