[ 
https://issues.apache.org/jira/browse/FLINK-11059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708190#comment-16708190
 ] 

ASF GitHub Bot commented on FLINK-11059:
----------------------------------------

shuai-xu opened a new pull request #7227: [FLINK-11059] [runtime] do not add 
releasing failed slot to free slots
URL: https://github.com/apache/flink/pull/7227
 
 
   
   ## What is the purpose of the change
   
   *This pr fix the bug that job master add back the slot when releasing to 
task executor failed due to timeout or so on, task executor may have already 
released the slot, but job master may continue deploy task to it.*
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Run SlotPoolTest*
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> JobMaster may continue using an invalid slot if releasing idle slot meet a 
> timeout
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-11059
>                 URL: https://issues.apache.org/jira/browse/FLINK-11059
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management
>    Affects Versions: 1.7.0
>            Reporter: shuai.xu
>            Assignee: shuai.xu
>            Priority: Major
>              Labels: pull-request-available
>
> When job master releases an idle slot to task executor, it may meet a timeout 
> exception which cause that task executor may have already released the slot, 
> but job master will add the slot back to available slots, and the slot may be 
> used again. Then job master will continue deploying task to the slot, but 
> task executor does not recognize it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to