lokeshlal commented on issue #6975: Dynamic pooling via allowing tasks to use 
more than one pool slot (depending upon the need)
URL: https://github.com/apache/airflow/pull/6975#issuecomment-569917507
 
 
   @tooptoop4 Yes the approach looks good when multiple pools are required as 
described in the jira ticket.  
   This will be useful in a scenario, where we have a spark cluster where jobs 
needs to be submitted with different complexity (such as Large jobs, medium 
jobs etc) and each job would require different capacity on spark cluster. Hence 
dynamic pooling can help control the spark cluster capacity directly from the 
Airflow using pools. this is aligned to the following jira ticket 
https://issues.apache.org/jira/browse/AIRFLOW-1467 
   
   The problem statement mentioned in the jira ticket 
https://issues.apache.org/jira/browse/AIRFLOW-6227, can be handled via locking 
a file for write. That is, if the ask is to keep one writer on a table, then 
before triggering spark job, create another task that will put a file write 
lock on a file (name same as table name) in the file system (libraries such as 
fasteners or lockfile in a python operator can be used). This will make sure 
that at a time only one job will be triggered for the said table and makes the 
code more dynamic rather than creating pools every time a new table is 
introduced. and once the spark job finishes (weather the job fail or pass) then 
release the lock from the file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to