[ 
https://issues.apache.org/jira/browse/FLINK-20195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240410#comment-17240410
 ] 

Chesnay Schepler commented on FLINK-20195:
------------------------------------------

Am I understanding you correctly that this happens when the job transitioned to 
CANCELED? Or can it be reproduced for any job, regardless of state transitions?

If it just happens during a transition, then this likely occurs due to 
{{Dispatcher#requestMultipleJobDetails}} not having a fully consistent view 
over all jobs. It first queries all job masters for it's respective job, and 
then the execution graph store where completed jobs reside in. It is 
conceivable that a JM can return a job, the job is then archived to the store, 
and then we retrieve the same job from the store.
An easy fix would be to de-duplicate entries based on the Job ID.

> Jobs endpoint returns duplicated jobs
> -------------------------------------
>
>                 Key: FLINK-20195
>                 URL: https://issues.apache.org/jira/browse/FLINK-20195
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / REST
>    Affects Versions: 1.11.2
>            Reporter: Ingo Bürk
>            Priority: Minor
>             Fix For: 1.12.0
>
>
> The GET /jobs endpoint can, for a split second, return a duplicated job after 
> it has been cancelled. This occurred in Ververica Platform after canceling a 
> job (using PATCH /jobs/\{jobId}) and calling GET /jobs.
> I've reproduced this and queried the endpoint in a relatively tight loop (~ 
> every 0.5s) to log the responses of GET /jobs and got this:
>  
>  
> {code:java}
> …
> {"jobs":[{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"RUNNING"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELLING"}]}
> {"jobs":[{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"RUNNING"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELLING"}]}
> {"jobs":[{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"FAILED"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"}]}
> {"jobs":[{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"},{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"FAILED"}]}
> {"jobs":[{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"},{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"FAILED"}]}
> …{code}
>  
> You can see in in between that for just a moment, the endpoint returned the 
> same Job ID twice.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to