XComp opened a new pull request #15090: URL: https://github.com/apache/flink/pull/15090
## What is the purpose of the change The test had two problems: 1. The parallelism of the job exceeded the available slots which caused a resource timeout for every job run 2. There's a know race condition between the `ResourceManager` cleaning up the requirements in the `DefaultDeclarativeSlotPool` while freeing the finished job's resources and the corresponding `TaskExecutor` freeing its tasks as part of the job cleanup. ## Brief change log * The job's parallelism was lowered. * A new parameter `taskmanager.slot.timeout` is introduced that makes the time a slot becomes inactive configurable independently from the rpc timeout which was used before. * The new parameter is used in `AdaptiveSchedulerSlotSharingITCase` to break the race condition between the two cleanup mechanisms. The slot is freed faster now If the it was accidentally allocated again for the finished job due to the `TaskExecutor` cleaning up faster than the `ResourceManager`. ## Verifying this change We looped over the test where it failed consistently before the change. The [AzureCI run](https://dev.azure.com/mapohl/flink/_build/results?buildId=307&view=results) failed due to no error being caught anymore. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? docs ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
