mukesh154 opened a new pull request, #23829: URL: https://github.com/apache/pulsar/pull/23829
### Motivation This pull request introduces a health check functionality for Kubernetes deployments, specifically adding a liveness probe for the function worker. The liveness probe is crucial for Kubernetes-based applications, enabling automated pod restarts in case of failure. This change ensures that the function worker recovers when a `ProducerFencedException` occurs, which causes the worker to get stuck and not recover. For instance, when a client makes a request like: ``` curl --location --request PUT 'https://localhost:6651/admin/v3/functions/test/test/test' --header 'Authorization: Bearer <token>' --header '...' --form '[email protected]' ``` For `POST`, `PUT`, and `DELETE` operations, the following error is returned under heavy load: ```json {"reason":"Internal Error updating function at the leader"} ``` And, when the following error occurs in the function worker currently: ``` ERROR org.apache.pulsar.functions.worker.FunctionMetaDataManager - Could not write into Function Metadata topic │ │ org.apache.pulsar.client.api.PulsarClientException$ProducerFencedException: Producer was fenced ``` The function worker does not recover, leading to an ongoing failure. With this update, the worker will automatically restart with the help of health check with liveliness probe upon encountering this error, ensuring proper recovery and continuity of operations. ### Modifications This update introduces an API endpoint to perform a liveliness check on the function worker pod. The API returns an HTTP status of `200 (OK)` when the `isLive` flag within `FunctionImpl` is true. If the flag is false, typically after a `ProducerFencedException` occurs, the API will return a status of `503 (Service Unavailable)`. The Kubernetes deployment configuration has been updated to use this new API endpoint in the liveness probe along with existing `metrics` endpoint, allowing the system to monitor the health and availability of the function worker. ### Verifying this change - [X] Make sure that the change passes the CI checks. *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation Check the box below or label this PR directly (if you have committer privilege). Need to update docs? - [ ] `doc-required` - [X] `no-need-doc` - [ ] `doc` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
