StephanEwen commented on issue #8846: [FLINK-12766][runtime] Dynamically allocate TaskExecutor's managed memory to slots. URL: https://github.com/apache/flink/pull/8846#issuecomment-510169482 The code is overall in decent shape. I am a bit unsure, though, about this approach. For example all the changes on the TaskManager result in something quite unintuitive to me. - there are slots, they have a resource profile - then an allocated resource profile, which is possibly a subset of that. - some operations (for example to identify idempotent requests) now take the new resource profile into account. I cannot even say whether that is correct or not - probably not breaking anything immediately, but breaks with the design of how requests are identified and can cause tricky bugs later. - the TM may now request requests for free slots based on its bookkeeping. While this should typically not happen when the RM bookkeeping is in sync, what happens when the views diverge, due to a request timeout? The slot allocation protocol is so sensitive, do we really want to add this additional complexity? I assume that the reason for all the TM changes is a combination of - when the JM allocates a slot, it actually gets that slot offered from the TM directly, and not from the RM. So the profile somehow needs to go through the TM to end up in the Slot on the JM - some safety net bookkeeping This solution is a temporary solution, an in the long run we probably want to either have fully dynamically sized slots, allocated and sized on demand, or keep the slots are static and change the operators in the APIs to adjust their resource consumption to what is there in the slots. Having a temporary solution, of this big complexity, touching something so sensitive as the slot allocation, and for a new SQL runtime that is in its first release and might change quite a bit in the next release... I simply do not feel comfortable to merge this. The risk of bugs and regression, and the added complexity seems so high compared to the benefit. I would be happy to double down on a FLIP about how resource management should look like in the next versions and make sure we have something really well working in the following release.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
