VladaZakharova commented on PR #32365: URL: https://github.com/apache/airflow/pull/32365#issuecomment-1621447736
> I am afraid this is going to introduce more problems than it solves. > > There are few problems with it: > > * you need ot be able to have service account that can create buckets - that's a new capability > * bucket is deleted immediately when lock is released, which means that other tasks will have the bucket deleted while they are waiting for the lock. This will cause failures > * this is very brittle and not self-healing. If you have one task abruptly killed while it holds the lock (happens when you have hardware failure for example) - your lock will remain and it will hold the whole DAG from re-running until you manually remove the lock > * In the future when we have AIP-44 implemented, tasks won't be able to query the DB of Airflow directly so your query about parallel tasks will fail (when AIP-44 is complete we are going to forbid built-in operators in all providers to run any DB query). > > So there are many things wrong here. I Cannot propose a different solution, but I think it should be based on the state of metadata - and metadata should be queried before they are updated. That might at least mitigate part of the problem. Ideally some kind of optiistic lock strategy on metadata would be great (i.e automatically updating version number that you can use to get optimisitc lock strategy - but I am afraid metadata API does not support it, maybe some other mechanism that provides a lock - but not based on manually managed GCS lock, something that will automatically release lock when the client gets killed. Not sure what it could be - but the solution proposed here is going IMHO to cause more problems than it solves. Thank you for the ideas! Yes, looks like i was trying to solve the problem, but there are a lot of unpredictable problems that could happen during DAG execution. I will try to find something else here using the metadata you mentioned :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
