ecerulm commented on issue #21087:
URL: https://github.com/apache/airflow/issues/21087#issuecomment-1121560242

   > 410 does not happen due to network issues or connectivity issue between 
airflow and kubeapi server.
   
   I have reproduced this locally with minikube and kubernetes python client so 
I can assure you it CAN happen due to network issues between airflow and the 
k8s api. Let me explain the setup
   
   1. Minikube locally
   2. Python script that performs a watch in a while loop just like airflow does
   3. Python script connects to minikube via toxiproxy so that I can simulate a 
network disconnection (context: minikube2)
   4. As soon as I simulate the disconnection, the watch will exit with an 
exception `("Connection broken: InvalidChunkLength(got length b'', 0 bytes 
read)", InvalidChunkLength(got length b'', 0 bytes read))`
   5. The python script remembers the last resource version just like airflow 
does
   6. the python script retries continuously the watch with the last known 
resource_version=5659, failing each time with 
`HTTPSConnectionPool(host='127.0.0.1', port=2222): Max retries exceeded with 
url: /api/v1/namespaces/testns2/pods?resourceVersion=5659&watch=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x107b02c40>: 
Failed to establish a new connection: [Errno 61] Connection refused'))`
   7. In another window, I create/delete deployments in that namespace (via 
context minikube, which does not go through the toxiproxy). I do this to create 
new events.
   8. After 6 minutes, I reenabled the toxiproxy 
   9. The python script  retries the watch with resource_version=5659, it 
connects and I get a `(410) Reason: Expired: too old resource version: 5659 
(6859)` 
   
   
   
   > it happens when no event of type ADDED, MODIFIED, DELETED happens on the 
watched resource for a long time [ ~ 5 mins]
   
   I've been running a watch with the kubernetes python client  to a namespace 
where there is not new events at all for 1 hours and I did not get an 
ApiException(410). So, are you sure of this? Have you ever seen this yourself 
in your kubernetes environment?   
   
   > @ecerulm please go through the docs. it will help you understand why 410 
occurs and how BOOKMARK will prevent it.
   > it would really help the conversation if you would take the time to go 
through the KEP and other docs.
   
   I did read all the documents, and I think I understand this ok, also I have 
actually done testing and try to actually back up what I say by doing it.
   
   I think you mean something else by "prevent".
   
   I hope the scenario I included in this comment will help you understand why 
410 occurs in the event of network issues and how BOOKMARK **can't prevent** 
that. In principle BOOKMARK will help to get a better "last known resource 
version " at step 5 but by the time step 8 is reached that resource version 
won't be valid (if enough time has passed). And this is not theory it's 
something that you actually do test and reproduce yourself like I did.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to