potiuk commented on issue #27476: URL: https://github.com/apache/airflow/issues/27476#issuecomment-1314042656
> We default to 60s (which is a mistake imo, PR incoming!). You can't go beyond "0" without some risk of task failures from being out of sync There is no magic, again. shared volumes do none of sync quarantees either because there are no atomicity guarantees. Actually it is FAR worse (and our users suffered from those). In most cases of shared volumes (as far as I know) this risk is far greater. You can have partially synchronized directories, partially synchronized files - when you have Amazon EFS with low iops the consistency of your local version of DAG folder is simply impossible - because NFS synchronizes and flushes each file separately. With any sizeable shared volume, this problem is FAR worse than that - you can have locally A new dag importing an old version of a library. Or a new library imported by an old DAG. And absolutely no control over that. You can have arbitrary snapshots of arbitrary versions of files that were copied to a shared part of a volume at the other end. This is a major source fo instability for the users with EFS and low IOPS and likely it causes occasional untraceable errors when race conditions happens even if your IOPS are high. I think effect of that is way worse than "slight" sync delays. Especially if you are not aware of that effect. In case of GitSync because atomic replacement (symbolic link for the whole DAG replacement, we at least have guarantee of consistency. And it gest FAR worse when you combine GitSync + Shared volumes. Far, Far, Far Worse, Precistly because GitSync does this atomic replacement. The way how GitSync works - there are points in time where GitSync has exactly TWO FULL COPIES of full dag folder. One old and one new. When git sync retrieves a new commit, the copy of FULL DAG FOLDER is created and when git pull is completed it replaces a symbolic link to that new copy. That "git sync" process is extremely heavy on shared folders. Assume NFS 1) new files are synced (and NFS starts syncing them while git sync works) 2) once finished (NFS is still syncing) Git replaces the symbolic link to the new directory 3) Now - this means that effectively NFS has to delete all those files from old link and replace them with new files from the new link - they are effectively different files, with no relation whatsoever so NFS has to synchronise all of them again 4) all this while the files are continously read by multiple ends. The effect is tht Git Sync even with single line of change causes an Avalanche of changes in NFS-based system. Basically whole DAG folder is deleted and recreated on remote ends from scratch with every single commit. Of course various optimisations and versions of NFS and shared volumes have some optimisations but none of those is prepared to the scenario that suddenly whole huge DAG folder is replaced by another (this is what git sync does with every single change coming). > Generally, yes, but I'm not ready to say "never". Well if I consider the way how Airlfow Scheduler accesses the files and how GitSync atomic commit + SharedVolume cause an avalanche of traffic and commmunication and that shared volumes do not provide ANY consistency guarantee - I am quite ready to say that "GitSync + Shared Volumes" is never. > Don't get me wrong though, I'm not going to die on this hill. I'm more of a "bake it into the image" + KubernetesExecutor guy anyways man_shrugging. I am also not dying on that hill. I just very strongly advocate for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
