dstandish commented on pull request #21829: URL: https://github.com/apache/airflow/pull/21829#issuecomment-1057405118
> > I think it's a good idea to improve messaging around this but I'm not sure the best way. I think that ideally we allow the dag to exist, but maybe just bubble up a "flash" message alerting with the offending dag and task. > > The problem with the dag existing is that the scheduler would keep on trying to queue it. If you check the linked issue, it blocks some other tasks from being queued. Ah i see. Well, alternatively we could amend the query to filter on `pool in pools` or join to pools table. > > One reason is, if you are developing locally, you might want to use airflow dags list to verify your dag parses ok. Or you might want to run airflow dags test or airflow tasks test to run your task, without creating the pool. And what if perhaps you remove the pool to "temporarily disable" a set of tasks. Maybe in this case it's better to alert about "misconfiguration" but allow the dag to remain in the system. > > I thought about not creating dagrun if a task has wrong pool but it looks like raising error at that point could keep the scheduler crash looping. I'm not sure we'd need to throw an error there. Couldn't we just not create the dag run? Or in any case, if we filter on "has existent pool" in the query above, it wouldn't even be _considered_ for scheduling. Perhaps alongside `import_errors` we could add an attribute `configuration_errors` or something to DagBag and then use this to bubble up a flash alert like [here](https://github.com/apache/airflow/blob/08575ddd8a72f96a3439f73e973ee9958188eb83/airflow/www/views.py#L784-L788). If we had a `configuration_errors` thing, this would not be a hard error but something we'd want to warn user about. In this scenario we could also warn if pool has size 0. Thinking out loud a bit here. > Also, since by default, if a task doesn't have a pool, it's assigned the `default_pool`, I think it's OK to raise import error if a task has a wrong pool. This is something that can cause issues for the scheduler and it is a user error, that's why I think raising the error is better. Yeah I'm just thinking about the local developer experience. ### idea You know, maybe another way to this concern while keeping your approach would be something like `AIRFLOW__SCHEDULER__IGNORE_POOLS` which you could set True for local dev and then this would be non-issue for local dev. Then locally the dag parses, and it runs, and no need to worry about pools. ### another idea What if instead of failing in this scenario, we _create_ the pool with default size = 10 or something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
