dstandish commented on pull request #21829: URL: https://github.com/apache/airflow/pull/21829#issuecomment-1057646262
> > Ah i see. Well, alternatively we could amend the query to filter on pool in pools or join to pools table. > > In this case, the user won't know why the scheduler is not scheduling the task with non-existent pool. We'd still want to alert users to the bad configuration. Was just saying maybe don't fail the dag parse. E.g. along with `import_errors`, store `configuration_errors` or something. > > I'm not sure we'd need to throw an error there. Couldn't we just not create the dag run? Or in any case, if we filter on "has existent pool" in the query above, it wouldn't even be considered for scheduling. > > The Dag would keep coming up for scheduler to create dagrun even when ignored and would block other eligible dags from creating dagruns I guess it's not dag run (since they don't have pools) but task instance... But anyway if we join to pool they'd be filtered out right? Anyway just an idea. > > Perhaps alongside import_errors we could add an attribute configuration_errors or something to DagBag and then use this to bubble up a flash alert like [here](https://github.com/apache/airflow/blob/08575ddd8a72f96a3439f73e973ee9958188eb83/airflow/www/views.py#L784-L788). > > If we had a configuration_errors thing, this would not be a hard error but something we'd want to warn user about. In this scenario we could also warn if pool has size 0. Thinking out loud a bit here. > > We currently have import error is pool size is less than 0, see [here](https://github.com/apache/airflow/blob/5b45a78dca284e504280964951f079fca1866226/airflow/models/baseoperator.py#L783-L785). That's still why I think we should have this as import error. I could have verified the pool name below the above verification but because we still commit when we use `provide_session`, the scheduler would crash. That is something slightly different. That's how many slots that particular task should be configured to occupy. E.g. if you have 10 slots in your "sql server" queue, and you have a very heavy query, you could have it take up 5 slots -- but it's different from pool size. > I may not be understanding fully well the idea you are proposing. If you don't mind, you can create a PR so we can compare the two? I'm just concerned that making this an import error will negatively impact developer experience, and I'm trying to help think about alternatives that achieve similar goals. Because I think locally it's common you won't have all the pools, so it feels ilke it should be a warning of some kind but not a hard error (and if you just do a warning, presumably queries could be modified to filter out TIs with bad pools). But if I think (1) being able to disable consideration of pools would make this work. And another thing that I think would be really good would be (2) to make it so you didn't need to create the pools in the db e.g. if they could be defined in `airflow.cfg` or if you could do `AIRFLOW_POOL_MY_POOL=1`. With either or both of these in place, I think raising would be ok. It's easier to manage for local developers than having to add the pools to the db, and it would not be affected negatively by resetting the db. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
