zentol opened a new pull request #13553:
URL: https://github.com/apache/flink/pull/13553
Based on #13544.
Adds the `DeclarativeSlotManager` (DSM), an alternative `SlotManager`
implementation that supports FLIP-138.
The core difference to the existing implementation is that the DSM does not
receive individual slot requests from the JobMaster, but a
`ResourceRequirements` instead that outlines the absolute requirements for the
job.
The DSM keeps track of these, along with all registered and pending slots,
matches requirements to slots and initiates the allocation of the slot.
Most of this logic and book-keeping is handled by recently added components
(`SlotTracker`, `ResourceTracker`, `TaskExecutorManager`).
What is left in the DSM itself is
a) a bunch of glue between these components and the `SlotManager` API
b) the logic for matching missing to available/pending resources and
initiating the corresponding slot allocations.
The entry point for b) is `checkResourceRequirements()`, which is called on
any significant change to the state of requirements or resources; it is called
when requirements change, slots are added/removed, allocations are
complete/failed.
This method essentially does the following:
- retrieve the currently missing resources from the `ResourceTracker`
- try assigning these resources to free slots, provided by the `SlotTracker`
- if a match was found, start the allocation of the slot
- if at the end of this process there are still missing resource, match the
remaining missing resources against the pending slots, provided by the
`TaskExecutorManager`
- if a match was found for a requirement, do nothing. We do not maintain a
mapping of requirements to pending resources as it would double the
book-keeping. We just consider this requirement as fulfilled within the scope
of this check.
- if no match was found for a requirement, try allocating a new worker
(implicitly increasing the pool of pending slots)
- if at the end of this process missing resource are still present (== no
more workers can be allocated) inform the JobMaster about there not being
enough resources to fulfill the requirement
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]