lokeshlal edited a comment on issue #6975: Dynamic pooling via allowing tasks 
to use more than one pool slot (depending upon the need)
URL: https://github.com/apache/airflow/pull/6975#issuecomment-569917507
 
 
   @tooptoop4 Yes the approach looks good when multiple pools are required as 
described in the jira ticket.  
   This PR will be useful in a scenario, where we have a spark cluster where 
jobs needs to be submitted with different complexity (such as Large jobs, 
medium jobs etc) and each job would require different capacity on spark 
cluster. Hence dynamic pooling can help control the spark cluster capacity 
directly from the Airflow using pools. this is aligned to the following jira 
ticket https://issues.apache.org/jira/browse/AIRFLOW-1467 
   
   The problem statement mentioned in the jira ticket 
https://issues.apache.org/jira/browse/AIRFLOW-6227, can be handled via locking 
a file for write. That is, if the ask is to keep one writer on a table, then 
before triggering spark job, create another task that will put a file write 
lock on a file (name same as table name) in the file system (libraries such as 
fasteners or lockfile in a python operator can be used). This will make sure 
that at a time only one job will be triggered for the said table and makes the 
code more dynamic rather than creating pools every time a new table is 
introduced. and once the spark job finishes (weather the job fail or pass) then 
release the lock from the file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to