iodone opened a new issue, #5402: URL: https://github.com/apache/kyuubi/issues/5402
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before creating - [X] I have searched in the [task list](https://github.com/orgs/apache/projects/296) and found no similar tasks. ### Mentor - [ ] I have sufficient knowledge and experience of this task, and I volunteer to be the mentor of this task to guide contributors to complete the task. ### Skill requirements - Familiarize the integration process of Spark and Kyuubi engine plugins. - Understand the principles of collecting Spark JVM metrics. ### Background and Goals When facing out-of-control memory management in Spark engine, we typically use JVMkill as a remedy by killing the process and generating a heap dump for post-analysis. However, even with jvmkill protection, we may still encounter issues caused by JVM running out of memory, such as repeated execution of Full GC without performing any useful work during the pause time. Since the JVM does not exhaust 100% of resources, JVMkill will not be triggered. So introducing JVMQuake provides more granular monitoring of GC behavior, enabling early detection of memory management issues and facilitating fast failure. ### Implementation steps 1. Start the JVMQuake for the driver and executor through Spark plugins. 2. Collect GC metrics using JVMQuake. 3. Set rules for killing processes and specify the path for saving HeapDump. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
