risyomei opened a new issue, #4997: URL: https://github.com/apache/kyuubi/issues/4997
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the bug #### Overview 1. When a connection is established from a beeline to KyuubiServer, a Spark Application (specifically, `org.apache.kyuubi.engine.spark.SparkSQLEngine`) is submitted. 1. After the `SparkSQLEngine` is ready, the KyuubiServer will connect to the engine with KyuubiSyncThriftClient#createClient. #### Issue The problem arises when you restart or stop the KyuubiServer while: 1. the `SparkSQLEngine` is already accepted as YARN application (as verified from ResourceManager), and 2. The KyuubiServer is not connected to the Engine yet, In such scenario, the `SparkSQLEngine` will keep running until it hits the `kyuubi.session.engine.idle.timeout` It's worth mentioning that it might not be an issue when the engine share level is USER, as other client connections could still use the Engine. However, it will be a waste of resource when the engine share level is CONNECTION. That's because all existing client connections will be disconnected once the KyuubiServer is shut down, and the engine is not reachable by any client at that point. #### Question Is it an expected behavior to rely on the timeout to terminate the Engine under the CONNECTION sharing level? Overview A Spark Application (specifically, org.apache.kyuubi.engine.spark.SparkSQLEngine) is initiated once a connection from a beeline to the KyuubiServer is formed. Once the SparkSQLEngine is operational, the KyuubiServer makes a connection to the engine utilizing KyuubiSyncThriftClient#createClient. Issue The problem arises when you reboot or halt the KyuubiServer while: The SparkSQLEngine has been recognized as a YARN application (as verified from ResourceManager) and The KyuubiServer hasn't established a connection with the Engine yet. In such a scenario, the SparkSQLEngine will continue running until it meets the kyuubi.session.engine.idle.timeout. It's pertinent to note that this might not pose a problem when the engine share level is USER, as other client connections could still use the Engine. However, it results in resource wastage when the engine share level is CONNECTION. That's because all existing client connections will be cut off once the KyuubiServer is shut down. Query Is it anticipated behavior to rely on the timeout to terminate the Engine under the CONNECTION sharing level? In my opinion, a more efficient method could be taking adavantage of `kyuubi.session.engine.alive.probe.enabled` or implement a similar mechanism where if an engine didn't receive any probe in a given timeframe, it kill itself. ### Affects Version(s) master ### Kyuubi Server Log Output _No response_ ### Kyuubi Engine Log Output _No response_ ### Kyuubi Server Configurations _No response_ ### Kyuubi Engine Configurations _No response_ ### Additional context _No response_ ### Are you willing to submit PR? - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix. - [ ] No. I cannot submit a PR at this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
