Whojohn opened a new issue, #17840: URL: https://github.com/apache/dolphinscheduler/issues/17840
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description # Purpose When Linux triggers an OOM kill, it should prioritize killing the worker's shell and its subclass implementations (the parts most prone to OOM), rather than dolp-related component modules. Additionally, it should delay the kernel from killing all dolp services such as worker/master as much as possible to ensure stability. # Version Discussion > dolp:3.4 (dev) > Reference (linux oom kill) https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-oom-adj-adjust-the-oom-killer-score # Improvements 1. **When users enable cgroup (task.resource.limit.state=true), modify the oom_score_adj score to 1000 in BaseLinuxShellInterceptorBuilder.** - BaseLinuxShellInterceptorBuilder#now ``` sudo systemd-run -q --scope -p CPUQuota=1% -p MemoryLimit=512M --uid=root bash /tmp/xxx/971445_1934185.command ``` - BaseLinuxShellInterceptorBuilder#after ``` sudo systemd-run -q --scope -p CPUQuota=5% -p MemoryLimit=1500M --uid=root bash -c echo 1000 > /proc/self/oom_score_adj && exec bash /tmp/xxx/971445_1934185.command ``` 2. **Modify bin/dolphinscheduler-daemon.sh, set oom_score_adj to -1000 to ensure it gets killed as late as possible.** - dolphinscheduler-daemon#now ``` nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 2> $log & ``` dolphinscheduler-daemon#after ``` nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 2> log & echo -1000 > /proc/!/oom_score_adj ``` # Why so design q1: Why ensured cgroup is open? In container/k8s environments, there should be write permission issues with the proc directory. Docker mode has been verified to be unable to start with the default image. q2: What's the benefit? When using shell command concatenation for startup, excessive concurrency will immediately trigger OOM kill. If concurrency increases further, it may even kill the dolp service. q3: Should shell types be linked with task priority? The default oom_score_adj of 1000 will cause all priority tasks to be killed as quickly as possible. This feature could be linked with task priority, but determining what value to configure for each priority level and how to ensure fair OOM kill is a difficult problem to determine, so uniformly configuring it to 1000 is simpler and more feasible. # 目的 当linux发生oom kill 时候,应该优先杀死worker 的 shell以及子类实现(最容易发生oom 部分),而不是 dolp 相关的组件模块。并且尽量迟让内核杀死 work/master 等所有 dolp 服务保障稳定。 # 讨论的版本 3.4 (dev) # reference https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-oom-adj-adjust-the-oom-killer-score # 改进点 **1. 当用户开启 cgroup (task.resource.limit.state=true)时,修改 oom_score_adj 评分。 BaseLinuxShellInterceptorBuilder 启动相关修改 oom_score_adj 为1000 ** BaseLinuxShellInterceptorBuilder#now > sudo systemd-run -q --scope -p CPUQuota=1% -p MemoryLimit=512M --uid=root bash /tmp/xxx/971445_1934185.command BaseLinuxShellInterceptorBuilder#after > sudo systemd-run -q --scope -p CPUQuota=5% -p MemoryLimit=1500M --uid=root bash -c echo 1000 > /proc/self/oom_score_adj && exec bash /tmp/xxx/971445_1934185.command ** 2. bin/dolphinscheduler-daemon.sh , 修改 oom_score_adj 为 -1000,保证尽量晚被杀死。** dolphinscheduler-daemon#now nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 2> $log & dolphinscheduler-daemon# after nohup /bin/bash "$DOLPHINSCHEDULER_HOME/$command/bin/start.sh" > /dev/null 2> $log & echo -1000 > /proc/$!/oom_score_adj # 为什么这样设计 q1: 为什么必须要确保 cgroup 或者资源限制模式?容器/k8s 环境下 proc 目录应该会存在写入问题。 docker 模式已经验证过无法通过默认镜像进行启动。 q2: 效果?当使用shell 等命令行拼接启动,并发过大会立刻产生oom kill ,假如并发进一步加大,甚至会杀死 dolp 服务。 q3: shell 类型是否考虑和任务优先级联动? oom_score_adj 默认1000 会导致所有优先级任务都尽可能快被杀死,该功能可以考虑和任务优先级联动,但是每一个优先级应该配置数值为多少,如何保证oom kill 公平,是一个比较难确定的问题,所以统一配置为1000 比较简单可行。 ### Are you willing to submit a PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
