[ 
https://issues.apache.org/jira/browse/FLINK-20195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17455164#comment-17455164
 ] 

Samuel Lacroix commented on FLINK-20195:
----------------------------------------

This issue is very problematic for us. It happens very frequently, and the 
duplicated job never disappears. Worse : it eventually leads to a situation 
where the JM tries two restore two checkpoints from ZK instead of one, and 
fails because only one exists.

 

We're considering forking flink just to get rid of this FLINK-22434 ticket that 
introduced the bug. Which is not ideal.

> Jobs endpoint returns duplicated jobs
> -------------------------------------
>
>                 Key: FLINK-20195
>                 URL: https://issues.apache.org/jira/browse/FLINK-20195
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / REST
>    Affects Versions: 1.11.2
>            Reporter: Ingo Bürk
>            Priority: Minor
>
> The GET /jobs endpoint can, for a split second, return a duplicated job after 
> it has been cancelled. This occurred in Ververica Platform after canceling a 
> job (using PATCH /jobs/\{jobId}) and calling GET /jobs.
> I've reproduced this and queried the endpoint in a relatively tight loop (~ 
> every 0.5s) to log the responses of GET /jobs and got this:
>  
>  
> {code:java}
> …
> {"jobs":[{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"RUNNING"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELLING"}]}
> {"jobs":[{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"RUNNING"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELLING"}]}
> {"jobs":[{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"FAILED"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"},{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"}]}
> {"jobs":[{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"},{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"FAILED"}]}
> {"jobs":[{"id":"53fd11db25394308862c997dce9ef990","status":"CANCELED"},{"id":"e110531c08dd4e3dbbfcf7afc1629c3d","status":"FAILED"}]}
> …{code}
>  
> You can see in in between that for just a moment, the endpoint returned the 
> same Job ID twice.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to