lokeshlal commented on issue #6975: Dynamic pooling via allowing tasks to use more than one pool slot (depending upon the need) URL: https://github.com/apache/airflow/pull/6975#issuecomment-569917507 @tooptoop4 Yes the approach looks good when multiple pools are required as described in the jira ticket. This will be useful in a scenario, where we have a spark cluster where jobs needs to be submitted with different complexity (such as Large jobs, medium jobs etc) and each job would require different capacity on spark cluster. Hence dynamic pooling can help control the spark cluster capacity directly from the Airflow using pools. this is aligned to the following jira ticket https://issues.apache.org/jira/browse/AIRFLOW-1467 The problem statement mentioned in the jira ticket https://issues.apache.org/jira/browse/AIRFLOW-6227, can be handled via locking a file for write. That is, if the ask is to keep one writer on a table, then before triggering spark job, create another task that will put a file write lock on a file (name same as table name) in the file system (libraries such as fasteners or lockfile in a python operator can be used). This will make sure that at a time only one job will be triggered for the said table and makes the code more dynamic rather than creating pools every time a new table is introduced. and once the spark job finishes (weather the job fail or pass) then release the lock from the file.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
