## What is the purpose of the change

*Wait for the result of asynchronous operations to be served before shutting 
down the cluster.  This is necessary for the _"cancel with savepoint"_ 
operation. If we do not wait for the result to be accessed by the client, we 
may shutdown the cluster, and the client gets a `ConnectionException`.*

cc: @zentol @tillrohrmann 

## Brief change log

  - *Before shutting down cluster, wait for asynchronous operations.*
  - *Log stacktrace if checkpoint cannot be ack'ed.*


## Verifying this change

This change added tests and can be verified as follows:

  - *Added test to `RestServerEndpointITCase` to verify that handlers are 
closed first.*
  - *Added unit tests for `CompletedOperationCache`.*
  - *Verified the changes by submitting and cancelling with savepoint of a job 
in a loop.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / **no**)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
  - The serializers: (yes / **no** / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / **no** / 
don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
  - The S3 file system connector: (yes / **no** / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / **no**)
  - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)


[ Full content available at: https://github.com/apache/flink/pull/6785 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to