Hi, I am working on auto scaling support for native deployments. Today Flink provides Reactive mode however it only runs on standalone deployments. We use Kubernetes native deployment. So I want to increase or decrease job resources for our streamin jobs. Recent Flip-138 and Flip-160 are very useful to achieve this goal. I started reading code of Flink JobManager, AdaptiveScheduler and DeclarativeSlotPool etc.
My assumption is Required Resources will be calculated on AdaptiveScheduler whenever the scheduler receives a heartbeat from a task manager by calling public void updateAccumulators(AccumulatorSnapshot accumulatorSnapshot) method. I checked TaskExecutorToJobManagerHeartbeatPayload class however I only see *accumulatorReport* and *executionDeploymentReport* . Do you have any suggestions to collect metrics from TaskManagers ? Should I add metrics on TaskExecutorToJobManagerHeartbeatPayload ? I am open to another suggestion for this. Whenever I finalize my investigation. I will create a FLIP for more detailed implementation. Thanks for your help in advance. Talat