potiuk commented on pull request #18538:
URL: https://github.com/apache/airflow/pull/18538#issuecomment-927622890


   I have a quite plausible hypothesis why this could help:
   
   * When you look at this https://github.com/actions/runner/issues/510 - you 
will see that `--once` has a race condition where second job can be picked up
   * The issue describes it as resulting with "lost communication with the 
server for the second job"  - we saw it hapening in the past, but I do not 
recall seeing that over the last weeks. I think it was caused by the server 
droping such connection for runners which completed job with `--once`
   * I think they introduced the `--ephemeral` flag to have a different 
approach as they could not easily fix `--once` due to architecture of it, but I 
also believe they changed the server so that it accepts more jobs coming from 
`--once` runner, because this is the only way they could get rid of the `lost 
communication` problem when someone already (like us) was using `--once`
   
   I now for a fact that there is a hidden `.system` folder in mssql data 
volume and the previous `rm -rf -- /mssql/*` would skip it from deleting :
   
   ```
   root@2ca4ff7de277:/var/opt/mssql# ls -la
   total 0
   drwxr-xr-x 1 root root  42 Sep 26 17:22 .
   drwxr-xr-x 1 root root  10 Sep 26 17:22 ..
   drwxr-xr-x 1 root root  72 Sep 26 17:22 .system
   drwxr-xr-x 1 root root 224 Sep 26 17:22 data
   drwxr-xr-x 1 root root 334 Sep 27 07:52 log
   drwxr-xr-x 1 root root  22 Sep 26 17:22 secrets
   ```
   
   So I think this is a very, very plausible hypothesis.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to