QQhuxuhui opened a new pull request, #15094: URL: https://github.com/apache/dolphinscheduler/pull/15094
Fix the YARN resource leakage issue caused by the HikariCP connection pool fix: [#14187](https://github.com/apache/dolphinscheduler/issues/14187) fix: [#6092](https://github.com/apache/dolphinscheduler/issues/6092) <!--Thanks very much for contributing to Apache DolphinScheduler, we are happy that you want to help us improve DolphinScheduler! --> ## Purpose of the pull request The purpose of this submission is to address the resource leakage issue caused by the connection pool. Starting from version 2.x, third-party data sources utilize the HikariCP connection pool. Here is a description of the problem: 1. When running tasks in Hive on Spark mode, the JDBC connection in Hive is heavier compared to regular RDBMS connections. This is especially true in Hive on Spark mode because when the underlying connection needs to execute SQL, HS2 (HiveServer2) requests CONTAINER resources from YARN and starts a distributed Spark on YARN cluster to execute the compiled SQL in a distributed manner. After the SQL execution is complete, the Spark on YARN resources are not immediately released. Instead, they are kept for a certain period of time to reuse these resources for executing new SQL submitted through the same JDBC connection by the client. The Spark on YARN resources are only completely released when the JDBC connection is closed or when the configured timeout is reached without any new SQL submissions from the client. 2. When using a database connection pool technology, closing a JDBC connection essentially returns the connection to the connection pool without actually closing the underlying JDBC connection. As a result, the Spark on YARN resources associated with the connection are not released promptly, leading to resource leakage. Consequently, other jobs that request resources from YARN have to wait in a queue, affecting the execution of those jobs. 3. In the project, HikariCP is used as the database connection pool, and the idle timeout duration (idletimeout) for the database connections is not configured. The effective idle timeout duration is the default value of 10 minutes configured in the HikariCP source code. Therefore, after each connection's underlying SQL job is completed, it takes 10 minutes for the associated Spark on YARN resources to be truly released. This results in other jobs queuing and waiting for YARN resources. The approach in this solution is to adjust the parameters of the database connection pool, especially the minimum connection count and idle timeout time, in order to close idle database connections more quickly and actively. For example, configuring the IdleTimeout to 30 seconds and setting the MinimumIdle to 0. This means that all connections will be closed 30 seconds after the SQL job finishes, releasing all the Spark on YARN resources and resolving the resource leakage issue. <!--(For example: This pull request adds checkstyle plugin).--> ## Brief change log <!--*(for example:)* - *Add maven-checkstyle-plugin to root pom.xml* --> ## Verify this pull request <!--*(Please pick either of the following options)*--> This pull request is code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: <!--*(example:)* - *Added dolphinscheduler-dao tests for end-to-end.* - *Added CronUtilsTest to verify the change.* - *Manually verified the change by testing locally.* --> (or) If your pull request contain incompatible change, you should also add it to `docs/docs/en/guide/upgrede/incompatible.md` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
