Ah no, I meant that I wouldn't use a stateful set, rather just adjust the names of the pods that are created/managed directly by the job manager.
Regards, Alexis. Am Mo., 3. Juni 2024 um 07:31 Uhr schrieb Xintong Song < tonysong...@gmail.com>: > I may not have understood what you mean by the naming scheme. I think the > limitation "pods in a StatefulSet are always terminated in the reverse > order as they are created" comes from Kubernetes and has nothing to do with > the naming scheme. > > Best, > > Xintong > > > > On Mon, Jun 3, 2024 at 1:13 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > > > Hi Xintong, > > > > After experimenting a bit, I came to roughly the same conclusion: cleanup > > is what's more or less incompatible if Kubernetes manages the pods. Then > it > > might be better to just allow using a more stable pod naming scheme that > > doesn't depend on the attempt number and thus produces more stable task > > manager metrics. I'll explore that. > > > > Regards, > > Alexis. > > > > On Mon, 3 Jun 2024, 03:35 Xintong Song, <tonysong...@gmail.com> wrote: > > > > > I think the reason we didn't choose StatefulSet when introducing the > > Native > > > K8s Deployment is that, IIRC, we want Flink's ResourceManager to have > > full > > > control of the individual pod lifecycles. > > > > > > E.g., > > > - Pods in a StatefulSet are always terminated in the reverse order as > > they > > > are created. This prevents us from releasing a specific idle TM that is > > not > > > necessarily created lastly. > > > - If a pod is unexpectedly terminated, Flink's ResourceManager should > > > decide whether to restart it or not according to the job status. > > > (Technically, the same issue as above, that we may want pods to be > > > terminated / deleted in a different order.) > > > > > > There might be some other reasons. I just cannot recall all the > details. > > > > > > As for determining whether a pod is OOM killed, I think Flink does > print > > > diagnostics for terminated pods in JM logs, i.e. the `exitCode`, > `reason` > > > and `message` of the `Terminated` container state. In our production, > it > > > shows "(exitCode=137, reason=OOMKilled, message=null)". However, since > > the > > > diagnostics are from K8s, I'm not 100% sure whether this behavior is > same > > > for all K8s versions,. > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Sun, Jun 2, 2024 at 7:35 PM Alexis Sarda-Espinosa < > > > sarda.espin...@gmail.com> wrote: > > > > > > > Hi devs, > > > > > > > > Some time ago I asked about the way Task Manager pods are handled by > > the > > > > native Kubernetes driver [1]. I have now looked a bit through the > > source > > > > code and I think it could be possible to deploy TMs with a stateful > > set, > > > > which could allow tracking OOM kills as I mentioned in my original > > email, > > > > and could also make it easier to track metrics and create alerts, > since > > > the > > > > labels wouldn't change as much. > > > > > > > > One challenge is probably the new elastic scaling features [2], since > > the > > > > driver would have to differentiate between new pod requests due to a > TM > > > > terminating, and a request due to scaling. I'm also not sure where > > > > downscaling requests are currently handled. > > > > > > > > I would be interested in taking a look at this and seeing if I can > get > > > > something working. I think it would be possible to make it > configurable > > > in > > > > a way that maintains backwards compatibility. Would it be ok if I > > enter a > > > > Jira ticket and try it out? > > > > > > > > Regards, > > > > Alexis. > > > > > > > > [1] https://lists.apache.org/thread/jysgdldv8swgf4fhqwqochgf6hq0qs52 > > > > [2] > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/elastic_scaling/ > > > > > > > > > >