navinvishy commented on PR #357:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/357#issuecomment-2475381388

   Hi @gyfora , we have a situation where nodes are periodically taken down for 
maintenance, causing jobs to restart. A few restarts of a job within a few 
hours often result in an increasing consumer lag. It appears that the local 
task recovery techniques described 
[here](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/state/large_state_tuning/#task-local-recovery)
 and 
[here](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#enabling-local-recovery-across-pod-restarts)
 would address this. 
   
   Since we use the flink kubernetes operator, I landed on this PR looking for 
ways to enable this in the operator. Happy to take this forward if necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to