mukesh154 opened a new pull request, #23829:
URL: https://github.com/apache/pulsar/pull/23829

   ### Motivation
   
   This pull request introduces a health check functionality for Kubernetes 
deployments, specifically adding a liveness probe for the function worker. The 
liveness probe is crucial for Kubernetes-based applications, enabling automated 
pod restarts in case of failure. This change ensures that the function worker 
recovers when a `ProducerFencedException` occurs, which causes the worker to 
get stuck and not recover.
   
   For instance, when a client makes a request like:
   
   ```
   curl --location --request PUT 
'https://localhost:6651/admin/v3/functions/test/test/test' --header 
'Authorization: Bearer <token>' --header '...' --form 
'[email protected]'
   ```
   
   For `POST`, `PUT`, and `DELETE` operations, the following error is returned 
under heavy load:
   ```json
   {"reason":"Internal Error updating function at the leader"}
   ```
   
   And, when the following error occurs in the function worker currently:
   
   ```
   ERROR org.apache.pulsar.functions.worker.FunctionMetaDataManager - Could not 
write into Function Metadata topic │
   │ 
org.apache.pulsar.client.api.PulsarClientException$ProducerFencedException: 
Producer was fenced
   ```
   
   The function worker does not recover, leading to an ongoing failure. With 
this update, the worker will automatically restart with the help of health 
check with liveliness probe upon encountering this error, ensuring proper 
recovery and continuity of operations.
   
   ### Modifications
   
   This update introduces an API endpoint to perform a liveliness check on the 
function worker pod. The API returns an HTTP status of `200 (OK)` when the 
`isLive` flag within `FunctionImpl` is true. If the flag is false, typically 
after a `ProducerFencedException` occurs, the API will return a status of `503 
(Service Unavailable)`.
   
   The Kubernetes deployment configuration has been updated to use this new API 
endpoint in the liveness probe along with existing `metrics` endpoint, allowing 
the system to monitor the health and availability of the function worker.
   
   ### Verifying this change
   
   - [X] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   Check the box below or label this PR directly (if you have committer 
privilege).
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   - [X] `no-need-doc` 
   - [ ] `doc` 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to