dstandish commented on pull request #21829:
URL: https://github.com/apache/airflow/pull/21829#issuecomment-1057405118


   > > I think it's a good idea to improve messaging around this but I'm not 
sure the best way. I think that ideally we allow the dag to exist, but maybe 
just bubble up a "flash" message alerting with the offending dag and task.
   > 
   > The problem with the dag existing is that the scheduler would keep on 
trying to queue it. If you check the linked issue, it blocks some other tasks 
from being queued.
   
   Ah i see.  Well, alternatively we could amend the query to filter on `pool 
in pools` or join to pools table.
   
   > > One reason is, if you are developing locally, you might want to use 
airflow dags list to verify your dag parses ok. Or you might want to run 
airflow dags test or airflow tasks test to run your task, without creating the 
pool. And what if perhaps you remove the pool to "temporarily disable" a set of 
tasks. Maybe in this case it's better to alert about "misconfiguration" but 
allow the dag to remain in the system.
   > 
   > I thought about not creating dagrun if a task has wrong pool but it looks 
like raising error at that point could keep the scheduler crash looping. 
   
   I'm not sure we'd need to throw an error there.  Couldn't we just not create 
the dag run?  Or in any case, if we filter on "has existent pool" in the query 
above, it wouldn't even be _considered_ for scheduling.
   
   Perhaps alongside `import_errors` we could add an attribute 
`configuration_errors` or something to DagBag and then use this to bubble up a 
flash alert like 
[here](https://github.com/apache/airflow/blob/08575ddd8a72f96a3439f73e973ee9958188eb83/airflow/www/views.py#L784-L788).
  
   
   If we had a `configuration_errors` thing, this would not be a hard error but 
something we'd want to warn user about.  In this scenario we could also warn if 
pool has size 0. Thinking out loud a bit here.
   
   > Also, since by default, if a task doesn't have a pool, it's assigned the 
`default_pool`, I think it's OK to raise import error if a task has a wrong 
pool. This is something that can cause issues for the scheduler and it is a 
user error, that's why I think raising the error is better.
   
   Yeah I'm just thinking about the local developer experience.  
   
   ### idea
   
   You know, maybe another way to this concern while keeping your approach 
would be something like `AIRFLOW__SCHEDULER__IGNORE_POOLS` which you could set 
True for local dev and then this would be non-issue for local dev.  Then 
locally the dag parses, and it runs, and no need to worry about pools.
   
   ### another idea
   
   What if instead of failing in this scenario, we _create_ the pool with 
default size = 10 or something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to