snjypl commented on issue #21087: URL: https://github.com/apache/airflow/issues/21087#issuecomment-1121052643
@ecerulm as i understand there are two aspects to this issue: **1.) preventing 410 error from happening:** To prevent it we need to use the BOOKMARK feature. it was specifically introduced for this purpose. the prevention should be done by `kubernetes python client` in the stream loop [here]( https://github.com/kubernetes-client/python/blob/34ed13f9c729eea0b7581887ae0a1b0018664c5a/kubernetes/base/watch/watch.py#L162 ) now because of a bug in the kubernetes python client, the BOOKMARK events were not used correctly. the following pull requests seeks to fix it. https://github.com/kubernetes-client/python/pull/1796 it is because of this bug in kubernetes python client that you were getting 410 error even with `allow_watch_bookmark` enabled. **2.) handling the error once it happens.** now with or without `allow_watch_bookmarks` enabled , in case 410 error occurs the client is expected to restart the watch with resource_version ="0" or None [ None is preferred unless you have a need to trace from the beginning] i understand your pull request seeks to correctly address the `handling 410 part` of the issue. my view is, there is no need airflow to keep track of the last_resource_version or even have ResoureVersion singleton. it is not airflow job to do the actual 'watching' of resource. if Kubernetes client throws 410 error for a particular resourceVersion, then even if you restart the watch process with that resourceVersion you are bound to get the same error. i have opened a WIP pull request https://github.com/kubernetes-client/python/pull/1796 for a better discussion of this issue. @ecerulm i went through your PR, and i can see that you are addressing the 'handle 410' part of the issue. my suggestion would be that, we can catch the `ApiException` and call `_run` with resource_version="0" if `e.status == 410`. @cansjt has raised the concern about relying on the status for this particular error, but i believe it is safe, 410 is raised only for this particular error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
