[GitHub] [airflow] potiuk commented on pull request #29401: Configure pools via Helm chart

via GitHub Thu, 09 Feb 2023 01:11:21 -0800


potiuk commented on PR #29401:
URL: https://github.com/apache/airflow/pull/29401#issuecomment-1423860810


   > +1 -- Agreed with @jedcunningham . That is one of the things we said the 
user-community chart did that we would never do when we start building the 
official helm chart for Airflow (or so I remember :) )
   
   +1 too. Managing pools this way is a bad idea for the reasons @jedcunningham 
nicely laid out.
   
   > We had discussion on that. See 
https://github.com/apache/airflow/issues/18582
   
   As @eladkal mentioned, this would have to redefine the way our SQL queries 
are done in scheduler. There is a certain complexity of Pools in 
multi-scheduler context that makes it a bit complex (but not impossible I 
think). I once assesed this might be hard.  But I looked closely how it works 
and I **think** possibly we could attempt to change how Pools table is used. I 
think either code was different when I looked at it in #18582, or maybe I have 
not thought about this idea before, but let me explain the context:
   
   * Pool table is currenlty locked during scheduling as critical section in 
`_executable_task_instances_to_queued`:
   ```
           # Get the pool settings. We get a lock on the pool rows, treating 
this as a "critical section"
           # Throws an exception if lock cannot be obtained, rather than 
blocking
           pools = Pool.slots_stats(lock_rows=True, session=session)
   ```
   
   * And the `slot_stats` method uses pool to lock the pool table
   ```
           query = session.query(Pool.pool, Pool.slots)
   ````
   
   What happen next in the method then it builds the "PoolStats" dict and marks 
tasks as executable if they are still have "free slots" (but it uses the 
in-memory Dict). Only then it releases the lock on Pools.
   
   This is -  in order to accomodate multiple schedulers and the critical 
section is to make sure schedulers will not mark dag runs as executable when 
Pools would be over-subscribed.
   
   So while indeed Pools table is locked and used in the query, fundamentally I 
think we could take the "total" values from ENV variables rather than from 
Pools table. This is somewhat coincidental that we use "totals" from the Pools 
table - it's convenient and the code already has Pool objects returned by the 
query to build PoolStats, but not **really** necessary. 
   The main usage of the Pool table is that whole table is locked while the 
critical section happens. Locking Pool table also has nice side-effect that 
when we use UI or CLI to modify pools, we won't change the pool size while 
scheduler keeps the table locked in critical section - so pool size update will 
never happen while scheduler is in it.  But with env variables it is impossible 
to change it without restarting airflow, so it is not really an argument if we 
use them.  Or we even could have a separate lock for this ciricial section in 
`_executable_task_instances_to_queued` - it does not have to be Pools table 
lock.
   
   There is no referential identity between Pools and any other table it seems 
so I think we would just have to modify slots_stats function to retrieve totals 
from elsewhere (env vars) and things would work. The `pool` column in 
TaskInstance is just string (part of an index but not part of referential 
integrity with Pool table). 
   
   ```
       pool = Column(String(256), nullable=False)
   ```
   
   I still do not like the "multiple source of truth" in this case. But there 
is potentially an option that we could disable the pool counts in the UI 
entirely and replace them with pool counts. managed entirely in env vars only.
   
   Also I think changing the approach in this area is something that we want 
anyway so we could kill two birds with the same stone - see 
https://github.com/apache/airflow/issues/29416 - our pool count currently does 
not currently take into account deferred tasks. but there are cases where it 
would be desired.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on pull request #29401: Configure pools via Helm chart

Reply via email to